Another Probability Error in Medicine (Golden Ratio)

In Yalta, K., Ozturk, S., & Yetkin, E. (2016). “Golden Ratio and the heart: A review of divine aesthetics”, International Journal of Cardiology214, 107-112, the authors compute the ambulatory ratio of Systolic to Diastolic by averaging each and taking the ratio. “Mean values of diastolic and systolic pressure levels during 24-h, day-time and night-time recordings were assessed to calculate the ratios of SBP/DBP and DBP/PP in these particular periods”.

The error is to compute the mean SBP and mean DBP then get the ratio, rather than compute every SBP/DBP data point. Simply,

\(\frac{\frac{1}{n}\sum_{i=1}^n x_i}{\frac{1}{n}\sum_{i=1}^n y_i}\neq \frac{\sum _{i=1}^n \frac{x_i}{y_i}}{n}\)

Easy to see with just n=2: \(\frac{x_1+x_2}{y_1+y_2}\neq \frac{1}{2} \left(\frac{x_1}{y_1}+\frac{x_2}{y_2}\right)\).

The rest is mathematical considerations until I get real data to find the implication of this error that seems to have seeped through the literature (we know there is an eggregious mathematical error; how severe the consequences need to be assessed from data.). For the intuition of the problem consider that when people tell you that healthy people have on average BP of 120/80, that those whose systolic is 120 must have a diastolic 80, and vice-versa, which can only be true if the ratio is deterministic .

Clearly, from Jensen’s inequality, where \(X\) and \(Y\) are random variables, whether independent or dependent, correlated or uncorrelated, we have:

\(\mathbb{E}(X/Y) \neq \frac{\mathbb{E}(X)} {\mathbb{E}(Y)}\)

with few exceptions, s.a. a perfectly correlated (positively or negatively) \(X\) and \(Y\) in which case the equality is forced by the fact that the ratio becomes a degenerate random variable.

Inequality: At the core lies the fundamental ratio inequality (by Jensen’s) that:

\(\frac{1}{n}\sum _{i=1}^n \frac{1}{y_i} \geq \frac{1}{ \frac{\sum _{i=1}^n y_i}{n}}\),

or \(\mathbb{E}(\frac{1}{Y})\geq\frac{1}{\mathbb{E}(Y)} \). The proof is easy: \(\frac{1}{y}\) is a convex function of y and has a positive second derivative.

Allora when \(X\) and \(Y\) are independent, we have the ratio distribution

\(\mathbb{E}(\frac{X}{Y}) = \mathbb{E}(X) \times \mathbb{E}(\frac{1}{Y})\geq \frac{\mathbb{E}(X)} {\mathbb{E}(Y)}\)

Furthermore, where the two variables have support on \((-\infty, \infty)\), say a Gaussian distribution \(\mathcal{N} (\mu_1,\sigma_1)\), the mean of the ratio is infinite. How? Simply , for \(Z_1= \frac{1}{Y}\),

\(f(z_1)=\frac{e^{-\frac{(\mu z_1-1)^2}{2 \sigma ^2 z_1^2}}}{\sqrt{2 \pi } \sigma z^2}\) \(z\neq 0 \)

From where we can work out the counterintuitive result that if \(X\) and \(Y\) \(\sim\) \(\mathcal{N}(0,\sigma_1)\) and \(\mathcal{N}(0,\sigma_2)\) respectively,

\(\frac{X}{Y} \sim Cauchy(0,\frac{\sigma_1}{\sigma_2})\),

with infinite moments. As a nice exercise we can get the exact PDF under some correlation structure \(\rho\) in a bivariate normal:

\(f(z)= \frac{\sigma _1 \sigma _2 \sqrt{-\left(\left(\rho ^2-1\right) \left(\sigma _1^2+\sigma _2^2 z^2-2 \rho \sigma _2 \sigma _1 z\right)\right)}}{\pi \left(\sigma _1^2+\sigma _2^2 z^2-2 \rho \sigma _2 \sigma _1 z\right){}^{3/2}}\),

with a mean \(\frac{i \sqrt{\rho ^2-1} \sigma _1 \sigma _2}{\pi \left(\sigma _1^2+\sigma _2^2 z^2-2 \rho \sigma _2 \sigma _1 z\right)}\) that exists only if \(\rho=1\) (that is will be 0 in the exactly symmetric case).

Luckily, SBP (\(X\)) and DBP (\(Y\)) live in \((0, \infty)\) which should yield a finite mean and allow us to use Mellin’s transform which is a good warm up after the holidays (while witing for the magisterial Algebra of Random Variables to arrive by mail).

Note: For a lognormal distribution parametrized with \(\mu_1, \mu_2, \sigma_1,\sigma_2\), under independence:

\(\frac{\mathbb{E} X}{\mathbb{E} Y}= e^{-\sigma _2^2}\mathbb{E}\left(\frac{X}{Y}\right)\)

Owing to the fact that the ratio follows another lognormal with for parameters \(\left[\mu _1-\mu _2,\sqrt{\sigma _1^2+\sigma _2^2}\right]\).

Gamma: I’ve calibrated from various papers that it must be a gamma distribution with standard deviations of 14-24 and 10-14 respectively. There are papers on bivariate (multivariate) gamma distributions in the statistics literature (though nothing in the DSBR, the “Data Science Bullshitters Recipes”), but on this distribution later. We can work out that if \(X \sim \mathcal{G}(a_1,b_1)\) (gamma) and \(Y \sim \mathcal{G}(a_2, b_2)\), assuming independence (for now), we have the ratio \(Z\)

\(f(z)=\frac{b_1^{a_2} b_2^{a_1} z^{a_1-1} \Gamma \left(a_1+a_2\right) \left(b_2 z+b_1\right){}^{-a_1-a_2}}{\Gamma \left(a_1\right) \Gamma \left(a_2\right)}\)

with mean \(\mathbb{E}(Z)= \frac{a_1 b_1}{\left(a_2-1\right) b_2}\) while \(\frac{\mathbb{E}(X)} {\mathbb{E}(Y)}= \frac{a_1 b_1}{a_2 b_2}\).

Assuming Gamma Distribution

Pierre Zalloua has promised me 10,000 BP observations so we can compute the ratios under a correlation structure.