Fourth Definition of Kurtosis

In the Wikipedia topic on Moment, Kurtosis being the fourth moment, aka forth integral of the moment generating function, Wikipedia says, “The fourth central moment is a measure of the heaviness [or lightness] of the tail of the distribution, compared to the normal distribution of the same variance.” Notice here, no mention of peakedness.

In Yes or No in the Core and Tails II, I included some discussion of mixture models with a two-dimensional graphic that illustrated the summing of two distributions. The sum (red) was said to have a heavy tail.  It was interesting to see distributions in a mixture model acting as constraints. I have not been able to confirm that normals in other sums act as constraints. In a mixture model, the weights of the summed normals must add up to 1, so one normal has a weight of p, and the other would have a weight of 1-p. The yellow areas represent the constrained space. The red distribution is sandwiched between the green one and the blue one. The green normal and the blue normal are constraining the red normal.

In analysis, distribution theory is not about statistics, but rather as substitutes for functions. In linear programming, constraints are functions, so it should be of no surprise that distributions act as constraints. Statistics is full of functions like the moment function. Every time you turn around there is a new function describing the distribution. Those functions serve particular purposes.

Another view of the same underlying graph shows these normals to be events on a timeline, the normal timeline. Statistics lives in fear of p-hacking, or waiting around and continuing to collect data until statistical significance is achieved. But, what if you are not doing science. P-hacking wouldn’t pay if the people doing it were trying to make some money selling product, rather than capturing grant money. Statistics takes a batch approach to frequentist statistical inference. Everything is about data sets, aka batches of data, rather than data. But, if we could move from batch to interactive, well, that would be p-hacking. If I’m putting millions on a hypothesis, I won’t be p-hacking. If I’m putting millions on a hypothesis, I won’t use a kurtotic or skewed distribution that will disappear in just a few more data points or the next dataset. That would just be money to lose.

So what is a normal timeline? When n is low, shown by the green line in the figure, labeled A, the normal is tall, skinny ideally, ideally because it is also skewed and kurtotic which is not shown in this figure. We’ll ignore the skew and kurtosis for the moment. When n is finally high enough to be normal, shown by the red line, it is no longer tall, and not yet short. It is a standard normal. When n is higher, shown by the blue line, labeled B, the distribution is shorter and wider. So we’ve walked a Markov chain around the event of achieving normality and exceeding it. This illustrates a differential normality.

We achieve normality, then we exceed it. This is the stuff of differentials. I’ve talked about the differential geometry previously. We start out with Poisson games on the technology adoption lifecycle. These have us in a hyperbolic geometry. We pretend we are always in a Euclidean space because that is mathematically easy. But, we really are not achieving the Euclidean until our data achieves normality. Once we achieve normality, we don’t want to leave the Euclidean space, but even if we don’t, the world does, our business does. Once the sigma goes up, we find ourselves in a spherical geometry. How can so many businesses exist that sell the same given commodity in a multiplicity of ways? That’s the nature of the geodesic, the metric of spherical geometry. In a Euclidean space, there is one optimal way; in hyperbolic, less than one optimal way; and spherical, many. This is the differential geometry that ties itself to the differential normality. The differential normality that batch statistics, datasets hide. A standing question for me is whether we depart the Euclidean at one sigma or six sigma. I don’t know yet.

As a side note on mixture models like the underlying figure for the figures above, these figures show us normals that have a mean of zero, but their standard deviations differ. The first standard deviation is at the inflection point on each side of the normal distribution. The underlying figure is tricky because you would think, that all three normals intersect at the same inflection point. That might be true if all three had the same standard deviation. Since that is not the case, the inflection points will be in different places. The figure shows the inflection points on one side of the normal. When the distribution is not skewed, the inflection points on the other side of the mean are mirror images.

Mixture models can involve different distributions, not just normals. Summing is likewise not restricted to distributions having the same mean and standard deviations or being of the same kind of distributions.

Multivariable normals contain data from numerous dimensions. A single measure is tied to a single dimension. A function maps a measurement in a single dimension into another measurement in another dimension. Each variable in a multivariable normal brings its own measure, dimension, and distribution to the party. That multivariable normal sums each of those normals. Back in my statistics classes, adding normals required that they have the same mean and same standard deviation. That was long ago, longer than I think.

Enjoy.