The ambiguous middle of my decisions tree for my last post “Yes or No in the Core and Tails” has bugged me for a few days. I have a hard time thinking that I drive up to a canyon via a few roads, climb down to the river, cross the river, climb up the other side, and select one of many roads before driving off. That is not a reasonable way to deal with a decision tree that doesn’t get entirely covered by my sample space.

So what is this mess hinting at? **Do not stop sampling just because you’ve achieved normality!** Keep on sampling until you’ve covered the entire sample space. Figure out what power of 2 will produce a decision tree wide enough to contain the sample space, then sample the entire decision tree. Before normality is achieved, not sampling the entire base of the decision tree generates a skewed normal. This exposes you to skew risk. There will also be some excess kurtosis, which brings with it kurtosis risk.

Here is a quick table you can use to find the size of the sample space after you’ve found the number of samples you need to achieve normality. The sample space is a step function. Each step has two constraints.

Given that it takes less than 2048 samples to achieve a normal, that should be the maximum. 2^{11} should be the largest binary sample space that you would need, hence the red line. We can’t get more resolution with larger sample spaces.

Note that we are talking about a binary decision in a single dimension. When the number of dimensions increases the number of nomials will increase. This means that we are summing more than one normal. We will need a Gaussian mixture model when we sum normals. The usual insistences when adding normals is that need to have the same mean and standard deviation. Well, they don’t, hence the mixture models.

I took some notes from the Bionic Turtle’s YouTube on Gaussian mixture models. Watch it here.

Back when I was challenging claims that a distribution was binomial, I wondered where the fill between the normals came from. As I watched a ton of videos last night, I realized that the overlapping probability masses at the base had to go somewhere. I quickly annotated a graph showing the displaced probability mass in dark orange, and the places where the probability mass went in light orange. The areas of the dark orange should sum up to the areas of light orange. The probability masses are moved by a physics.

A 3-D Gaussian mixture model is illustrated next. I noted that there are three saddle points. They are playing three games at once or three optimizations at once. EM Clustering is alternative to the Gaussian mixture model.

So to sum it all up, **do not stop sampling just because you’ve achieved normality!**

Enjoy.