### Introduction and Result

A maximum entropy alternative to Bayesian methods for the estimation of independent Bernouilli sums.

Let \(X=\{x_1,x_2,\ldots, x_n\}\), where \(x_i \in \{0,1\}\) be a vector representing an *n* sample of independent Bernouilli distributed random variables \(\sim \mathcal{B}(p)\). We are interested in the estimation of the probability *p*.

We propose that the probablity that provides the best statistical overview, \(p_m\) (by reflecting the * maximum ignorance* point) is

\(p_m= 1-I_{\frac{1}{2}}^{-1}(n-m, m+1)\), (1)

where \(m=\sum_i^n x_i \) and \(I_.(.,.)\) is the beta regularized function.

### Comparison to Alternative Methods

**EMPIRICAL**: The sample frequency corresponding to the “empirical” distribution \(p_s=\mathbb{E}(\frac{1}{n} \sum_i^n x_i)\), which clearly does not provide information for small samples.

**BAYESIAN**: The standard Bayesian approach is to start with, for prior, the parametrized Beta Distribution \(p \sim Beta(\alpha,\beta)\), which is not trivial: one is contrained by the fact that matching the mean and variance of the Beta distribution constrains the shape of the prior. Then it becomes convenient that the Beta, being a conjugate prior, updates into the same distribution with new parameters. Allora, with *n* samples and *m* realizations:

\(p_b \sim Beta(\alpha+m, \beta+n-m)\) (2)

with mean \(p_b = \frac{\alpha +m}{\alpha +\beta +n}\). We will see below how a low variance beta has too much impact on the result.

### Derivations

Let \(F_p(x)\) be the CDF of the binomial \( \mathcal{B} in(n,p)\). We are interested in \(\{ p: F_p(x)=q \}\) the maximum entropy probability. First let us figure out the target value *q*.

To get the maximum entropy probability, we need to maximize \(H_q=-\left(\;q \; log(q) +(1-q)\; log (1-q)\right)\). This is a very standard result: taking the first derivative w.r. to *q*, \(\log (q)+\log (1-q)=0, 0\leq q\leq 1\) and since \(H_q\) is concave to *q*, we get \(q =\frac{1}{2}\).

Now we must find *p* by inverting the CDF. Allora for the general case,

\(p= 1-I_{\frac{1}{2}}^{-1}(n-x,x+1)\).

And note that as in the graph below (thanks to comments below by

Nassim:

It’s not clear you can come up with much of a “best estimate” from data with y=0. Maybe all that can be given in general (without using some prior information on p) are reasonable bounds. I like the Agresti-Coull procedure which gives a reasonable 95% interval. I’ve used this in a consulting project (in that case, the data were y=75 out of n=75) and put it in Regression and Other Stories.

Thanks, will update with a comment on bounds.

The problem of bounds requires a knowledge of $latex p$. But there is such a thing as a maximum ignorance range, say between .25-.75. Replace $latex \frac{1}{2}$ by $latex \frac{1}{4}$ and $latex \frac{3}{4}$, and there is a band.

Excuse me, Maestro, but don’t you mean “where $m \geq \sum x_i$”, i.e. at most $m$ dead patients?

Wouldn’t make more sense to use $p_m = I^{-1}_{1/2}(m+1, n-m$? Same results, more coherent with the fact that you are inverting the CDF.