In the research for my previous posts on kurtosis, I ran across mentions of kurtosis risk. I wasn’t up to diving into that, and getting too far away from what I was writing about in those posts. mc spacer retweeted More On Skew and Kurtosis. I reread the post and decided to conquer kurtosis risk. The exploration was underway.

One of the things they don’t teach you about in that into stats class is the logical proof of what we are doing. We take a mean without checking its normality. We go forward with the normal distribution as if it were normal, ordinary, usual, typical, non-problematic. Then, we meet the data, and it’s anything but normal. When meeting the data, we also meet skew risk and kurtosis risk. It’s like meeting your spouse to be’s mom. Usually, you meet your spouse to be’s dad at the same time. Yeah, they all show up at the same time.

You might get taught various ways to approximate the mean when you have less than 30 data points, aka when your sample is too small. That less than 30 data points is the space where skew risk and kurtosis risk happen. The sample statistics drive around a while getting close to the as yet unknown population mean, equalling it a few times, circling it, and finally pulling in and moving in. Our collection of sample means eventually approximates the population mean.

In artificial intelligence, back in the old days when it was important to think like a human, back in the days of expert systems, we encode the logic in augmented transition networks. A single transition would look like IF StopSign, THEN Stop. Of course, that’s not a network yet. That would wait until we wrote another, IF YeildSign, THEN Yield. That’s just another transition. Those two transitions would with some additional infrastructure become a network, thus they would become an augmented transition network. To make this easier, we used a descriptive language, rather than a procedural one. Prolog gives you the widest infrastructure. Prolog let you present it with a collection of transitions and it would build the proof to achieve the goal. It built a tree and trimmed the inconsistent branches.

We’ve seen that building the tree and trimming the inconsistent branches before. We use generative grammars to build a decision tree for a potential product, and constraints to trim that decision tree, so we arrive at the product fit for the moment. There is a logical argument to our product.

Similarly, there is a logical argument, or a proof, to our statistical analysis. There in that proof of our statistical analysis, our skew and kurtosis risk emerge.

Statistics happen after our data is collected. We think in terms of given (IF or WhatIF, WIF) this data, then these statistics. We don’t think about that driving around as looking for the population mean, as a process. Statistics is static, excepting the Bayesian approach. Logic insists. The proof frames everything we do. When computing a mean, the proof is going to insist on normality. But, this logical insistence is about the future, which means we are actually doing an AsIf analysis. We imagine that we checked for normality. We imagine that we know what we are doing since nobody told us any different yet. An AsIf analysis imagines a future and uses those imagined numbers as the basis for an analysis. In that imagining of the future, we are planning, we are allocating resources, we are taking risks. With samples, those risks are skewness and kurtosis risks.

I’m delayed defining skewness risk in this post until the very end. Once you understand kurtosis risk, skewness risk is nearly the same thing, so bare with me.

We will use the triangle model, which represents decision trees as triangles, to represent our proof.

In this figure, the root of the decision tree is at the bottom of the figure. The base of the tree is at the top of the figure. In the triangle model, the base of the triangle represents the artifact resulting from the decision tree, or proof.

Here we paired the distribution with its proof. A valid proof enables us to use the distribution. In some cases, the distributions can be used to test a hypothesis. An invalid proof leads to an invalid distribution which leads to an invalid hypothesis. Validity comes and goes.

OK, enough meta. What is Kurtosis risk?

When we assert/imagine/assume (AsIf) that the distribution is normal, but the actual data is not normal, we’ve exposed ourselves to kurtosis risk. We’ve assumed that the sample mean has converged with the population mean. We’ve assumed that we have a legitimate basis for hypothesis testing. Surprise! It hasn’t converged. It does not provide a basis for hypothesis testing.

As an aside, WIFs (What IFs) are what spreadsheets are for. Pick a number, any number to see what the model(s) will do. AsIfs come from scenario planning, a process that is much more textual than numeric. A scenario is an outcome from various qualitative forces.

Back to it. Google sent me to Wikipedia for the above definition of kurtosis. I drew the definition and kept on thinking. This picture is the final result of that thinking.

We start with the top-down, footprint view of normal distribution, a circle. The brown vertical line extends from the green cross on the right representing the mean, median, and mode which are the same for distributions that are normal.

Then, we see that our actual data is an ellipse. The blue vertical line extends from the green cross on the left. That line is labeled as being the mode of the skewed normal. In previous discussions of kurtosis, we use kurtosis to describe the tails of the distribution. In some definitions of kurtosis, kurtosis was seen as describing the peakedness of the distribution where we used kurtosis to describe the core of the distribution.

I drew a line through the two means. This line gave us two tails and a core. I should have drawn the core so it actually touched the two means. Then, I projected the two tails onto an x-axis so I would have a pair of lengths, the cosines of the original lengths. That one is longer and the other shorter is consistent with previous discussions of kurtosis.

A note on the core: I’ve taken the core to the most undifferentiated space under the curve. This is where no marketer wants to get caught. The circle that serves as the footprint of the normal is tessellated by some scheme. A shape in that tessellation represents the base of a histogram bar. From that bar, each adjacent histogram bar is exactly one bit different from that bar. The resolution of the shapes can be any given number of bits different, but that gets messy and, in the 3D graphic tessellation sense, patchy. A string “00000000” would allow its adjacent ring of histogram bars to contain up to eight different bars representing eight unique differences. “Ring” here is descriptive, not a reference to group theory. The histograms of the normal distribution encode all available differences. Refinements work outward from the undifferentiated mean to the highly differentiated circle of convergences, aka the parameter of the normal distributions footprint. We are somewhere under the curve. So are our competitors. So are our prospects and customers.

An ordinary interpretation of a peak with high peakedness is uniqueness or focus. That’s a high kurtosis value. A peak that’s less peaked, rounded, smoother is less unique, less focused, possibly smudged by averaging, tradeoffs, and gaps. It all shows up in the histogram bars. The other thing that shows up is the differences that are our product over the life of the product.

The other thing that shows up is the differences that are our product over the life of the product. A given iteration would have a particular shape. Subsequent iterations would build a path under the histograms that constitute the normal. Customers would cluster around different iterations. A retracted feature would show up as defections to competitors with different configurations more consistent with the cognitive processes of the defectors, our “once upon a time” users. Use tells. Differentiation segments.

So I attend to the tessellations and shapes of my histogram bars, to the sense of place, and to movement.

I then projected the core onto the sphere represented by the circle. Yes, the same circle we used to represent the footprint of the normal distribution. The core then appears as an ellipse. It should be closer to the pole, then it would be smaller. This ellipse should be the same shape as the top of the ellipsoid, containing the ellipse of the data, that the sphere is topologically deformed into.

Then, I drew a vector along the geodesic from the pole to the elliptical projection of the core to represent the force of topological deformation. I also labeled the circle and ellipse view to show how the deformation would be asymmetrical. The right is much less deformed than the right.

Next, I put the kurtosis in the summary view of a box chart using those lengths we found drawing a line through the two means. This box chart is tied to a view of the tails and kurtoses drawn as curvatures. As for the slopes of the distribution’s actual curve, they are approximations.

So that is kurtosis risk? When your sample means have not as yet converged to the population mean, you are exposed to kurtosis risk. Or, as Wikipedia put it when you asserted that the data is normally distributed, but it wasn’t, that assertion gives rise to kurtosis risk.

And, what of skew risk? You expose yourself to skew risk when you assert that your data is symmetric, when in fact, it isn’t. In the math sense, skew transforms the symmetric into the asymmetric and injects the asymmetries into the curvatures of the kurtoses constraining the tails along the radiant lines in the x-axis plane.

This business of the assertion-base for statistics involves constant danger and surprise. A single inconsistent assertion in the middle of the proof can invalidate much of the formerly consistent proof of a once useful analysis. Learn more, be more surprised. Those intro classes blunt the pointed sticks archers call arrows. Before they were pointed, they were blunt–dangerous in different ways. Enjoy.