In Graphs and Distributions, I mentioned that I was struggling with an idea that didn’t pan out. Well, the donut was the troublesome idea. I finally, found an explanation of why hypothesis testing doesn’t give us a donut. The null hypothesis contributes the alpha value, a radius of the null, of the test. And, the alternative hypothesis contributes the beta value, a radius of the alternative, of the test. You end up with a lense, a math term, hence the spelling. Rotating that lense gives you the donut, as I originally conceived it.

In the process of trying to validate the donut idea, I read and watched many explanations of hypothesis testing. I looked into skew and kurtosis as well. I’ve mashed it up and put into a single, probably overloaded diagram.

Here we have two normal separate by some distance between their means as seen from above looking down. We test hypotheses to determine if a correlation is statistically significant. While correlation is not causation, causation would be a vector from the mean of one normal to the mean of another. The distance between the means creates statistical significance. Remember that statistics is all about distance.

In hypothesis testing, you set alpha, but you calculate beta. Alpha controls the probability of a false positive or type I error. Alpha rejects the tail and accepts the shoulder and core, shown in orange. Beta rejects the core and some portion of the shoulder towards the core or center, shown in yellow. Alpha and beta generate the lense shape, shown in green, representing the area where the alternative hypothesis is accepted.

I drew the core touching the lense. This may not be the case. But, two authors/presenters stated that in hypothesis testing, the tails are the focus of the effort and the core is largely undifferentiated, aka not informative.

Then, I went on to skew and kurtosis. Skew moves the core. Kurtosis tells us about the shoulder and tail. The steeper and narrower the shoulder, the shorter the tail. This kurtosis is referred to as light. The shallower and wider shoulder, the longer the tail. This kurtosis is referred to as heavy. Skewness is about location relative to the x-axis. Since the top down view is not typical in statistics, the need for a y- or z-axis kurtosis parameter gets lost–at least at the amateur level of statistics, aka the 101 intro class. On the diagram, the brown double-ended arrow should reach the across the entire circle representing the footprint of the distribution.

The volume under the shoulders and tails sum to the same value. The allocation of the variance is different, but the amount of variance is the same.

One of the papers I read in regards to kurtosis can be found here. This author took on the typical focus of kurtosis as defining core by looking at the actual parameters, parameters about tails, to conclude that kurtosis is about tails.

Notice also that the word shoulder cropped up. I first heard of shoulders in the research into kurtosis. Kurtosis defines the shape of the shoulders. As such, it would have effects on the distribution similar to that of black swans. It changes the shape of the distribution at the shoulders and tails. Tails, further, are not the same when the distribution is skewed, but somehow this is overlooked, because there is only one skew parameter, rather than two or more. This leaves an open question as to what would change the kurtosis over time. The accumulation of data over time would change the skew and kurtosis of the distribution.

Where black swans shorten tails by moving the x-axis up or down the y-axis, kurtosis changes would happen when the probability mass moves to and from the shoulders and tails.

Regression generates a function defined as a line intersecting the mean. In the multivariate normal, there are several regressions contributing to the coverage of the variance under the normal. These regressions convert formerly stochastic variations into deterministic values. Factor analysis and principal component analysis all achieve this conversion of stochastic variation into deterministic or algebraic values. These methods consume variance.

Due to the focus of hypothesis testing being in the tails, core variance is consumed or shifted towards the tails. Alpha defines an epsilon value for the limit of the normal convergence with the x-axis. Alpha essentially says that if the value is smaller than alpha, ignore it, or reject it. Alpha is effectively a black swan.

Since a factor analysis discovers the largest factor first, and increasing smaller factors as the analysis continues, it constantly pushes variance towards the bottom of the analysis. The factor analysis also acts as an epsilon limiting convergence with the x-axis again, because we typically stop the factor analysis before we’ve consumed all the variance. We are left with a layer of determinism riding on top of a layer of the stochastic or variance. Bayesian statistics uncovers the deterministic as well.

A radar is basically a bunch of deterministic plumbing for the stochastic and some mechanisms for converting the shape of the stochastic into deterministic values. This layering of determinism and stochastic is typical.

One term that showed up in the discussion of skewness was direction. Note that they are not talking about direction in the sense of a Markov chain. The Markov chain is a vector representing causation where skewness does not represent causation.

The takeaway here should be that changes in skew and kurtosis will require us to retest our hypotheses just like the retesting caused by black swans. Data collection is more effective in the tails and shoulders than in the core if your intent is to discover impacts, rather than confirm prior conclusions.

Comments are welcome. Questions? What do you think about this? Thanks.