Maximum Ignorance Probability, with application to surgery’s error rates

Introduction and Result

A maximum entropy alternative to Bayesian methods for the estimation of independent Bernouilli sums.

Let $X=\{x_1,x_2,\ldots, x_n\}$, where $x_i \in \{0,1\}$ be a vector representing an n sample of independent Bernouilli distributed random variables $\sim \mathcal{B}(p)$. We are interested in the estimation of the probability p.

We propose that the probablity that provides the best statistical overview, $p_m$ (by reflecting the maximum ignorance point) is

$p_m= 1-I_{\frac{1}{2}}^{-1}(n-m, m+1)$, (1)

where $m=\sum_i^n x_i $ and $I_.(.,.)$ is the beta regularized function.

Comparison to Alternative Methods

EMPIRICAL: The sample frequency corresponding to the “empirical” distribution $p_s=\mathbb{E}(\frac{1}{n} \sum_i^n x_i)$, which clearly does not provide information for small samples.

BAYESIAN: The standard Bayesian approach is to start with, for prior, the parametrized Beta Distribution $p \sim Beta(\alpha,\beta)$, which is not trivial: one is contrained by the fact that matching the mean and variance of the Beta distribution constrains the shape of the prior. Then it becomes convenient that the Beta, being a conjugate prior, updates into the same distribution with new parameters. Allora, with n samples and m realizations:

$p_b \sim Beta(\alpha+m, \beta+n-m)$ (2)

with mean $p_b = \frac{\alpha +m}{\alpha +\beta +n}$. We will see below how a low variance beta has too much impact on the result.

Derivations

Let $F_p(x)$ be the CDF of the binomial $ \mathcal{B} in(n,p)$. We are interested in $\{ p: F_p(x)=q \}$ the maximum entropy probability. First let us figure out the target value q.

To get the maximum entropy probability, we need to maximize $H_q=-\left(\;q \; log(q) +(1-q)\; log (1-q)\right)$. This is a very standard result: taking the first derivative w.r. to q, $\log (q)+\log (1-q)=0, 0\leq q\leq 1$ and since $H_q$ is concave to q, we get $q =\frac{1}{2}$.

Now we must find p by inverting the CDF. Allora for the general case,

$p= 1-I_{\frac{1}{2}}^{-1}(n-x,x+1)$.

And note that as in the graph below (thanks to comments below by

5 thoughts on “Maximum Ignorance Probability, with application to surgery’s error rates”

Nassim:
It’s not clear you can come up with much of a “best estimate” from data with y=0. Maybe all that can be given in general (without using some prior information on p) are reasonable bounds. I like the Agresti-Coull procedure which gives a reasonable 95% interval. I’ve used this in a consulting project (in that case, the data were y=75 out of n=75) and put it in Regression and Other Stories.

Nassim Nicholas Taleb says:

September 9, 2021 at 9:38 pm

Thanks, will update with a comment on bounds.

The problem of bounds requires a knowledge of $latex p$. But there is such a thing as a maximum ignorance range, say between .25-.75. Replace $latex \frac{1}{2}$ by $latex \frac{1}{4}$ and $latex \frac{3}{4}$, and there is a band.

Excuse me, Maestro, but don’t you mean “where $m \geq \sum x_i$”, i.e. at most $m$ dead patients?

Wouldn’t make more sense to use $p_m = I^{-1}_{1/2}(m+1, n-m$? Same results, more coherent with the fact that you are inverting the CDF.

Comments are closed.