### Introduction and Result

A maximum entropy alternative to Bayesian methods for the estimation of independent Bernouilli sums.

Let \(X=\{x_1,x_2,\ldots, x_n\}\), where \(x_i \in \{0,1\}\) be a vector representing an *n* sample of independent Bernouilli distributed random variables \(\sim \mathcal{B}(p)\). We are interested in the estimation of the probability *p*.

We propose that the probablity that provides the best statistical overview, \(p_m\) (by reflecting the **maximum ignorance** point) is

\(p_m= 1-I_{\frac{1}{2}}^{-1}(n-m, m+1)\), (1)

where \(m=\sum_i^n x_i \) and \(I_.(.,.)\) is the beta regularized function.

### Comparison to Alternative Methods

**EMPIRICAL**: The sample frequency corresponding to the “empirical” distribution \(p_s=\mathbb{E}(\frac{1}{n} \sum_i^n x_i)\), which clearly does not provide information for small samples.

**BAYESIAN**: The standard Bayesian approach is to start with, for prior, the parametrized Beta Distribution \(p \sim Beta(\alpha,\beta)\), which is not trivial: one is contrained by the fact that matching the mean and variance of the Beta distribution constrains the shape of the prior. Then it becomes convenient that the Beta, being a conjugate prior, updates into the same distribution with new parameters. Allora, with *n* samples and *m* realizations:

\(p_b \sim Beta(\alpha+m, \beta+n-m)\) (2)

with mean \(p_b = \frac{\alpha +m}{\alpha +\beta +n}\). We will see below how a low variance beta has too much impact on the result.

### Derivations

Let \(F_p(x)\) be the CDF of the binomial \( \mathcal{B} in(n,p)\). We are interested in \(\{ p: F_p(x)=q \}\) the maximum entropy probability. First let us figure out the target value *q*.

To get the maximum entropy probability, we need to maximize \(H_q=-\left(\;q \; log(q) +(1-q)\; log (1-q)\right)\). This is a very standard result: taking the first derivative w.r. to *q*, \(\log (q)+\log (1-q)=0, 0\leq q\leq 1\) and since \(H_q\) is concave to *q*, we get \(q =\frac{1}{2}\).

Now we must find *p* by inverting the CDF. Allora for the general case,

\(p= 1-I_{\frac{1}{2}}^{-1}(n-x,x+1)\).

And note that as in the graph below (thanks to comments below by