## A Few Notes

March 20, 2018

Three topics came up this week. I have another statistics post ready to go, but it can wait a day or two.

## Immediacy and Longevity

I crossed paths with a blog post, “Content Shelf-life: Impressions, Immediacy, and Longevity,” on Twitter this week. In it, the author talks about the need for a timeframe that is deals with the rapid immediacy and the longevity of a product.

When validating the Agile-developed feature or use case, achieving that validity tells us nothing about the feature or use case in its longevity. When we build a feature or use case, we move as fast as we can. The data is Poisson. From that, we estimate the normal. Then, we finally achieve a normal. Operating on datasets, instead of time series hides this immediacy. Once that normal is achieved, we engage in statistical inference while at the same time continuing to collect data to reach the longevity. This data collection might invalidate our previous inferences. We have to keep our inferences on a short leash until we achieve a high sigma normal where it is big enough to stop moving around or shrinking the radius of our normal.

In the geometry sense, we start in the hyperbolic, move shortly to the Euclidean, and move permanently into the spherical. The strategies change, not the user experience. The user population grows. We reach the longevity. More happens, so more affects our architectural needs. Scale chasms happen.

The feature in its longevity might move the application and the experience of that application to someplace new, distant from the experience we created back when we needed validity yesterday, distant from the immediacy. The lengthening of tweets is just one example. My tweet stream has gotten shorter. That shortness makes Twitter more efficient, but less engaging. I’m not writing so many tweets to get my point across. There is less to engage with.

This longer-term experience in the is surprisingly very different. In the immediacy, we didn’t have the data to test this longest time validity. Maybe we can Monte Carlo that data. But, how would we prevent ourselves from generating more of that immediacy data in bulk that won’t reflect the application’s travel across the pragmatism gradient?

The lengthening of the tweets probably saved them some money because they didn’t have to scale up the number of tweets they handled. They take up more storage, but no more overhead, a nice thing if you can do it.

## Longest-Shortest Time

Once the above tweet took me to the above post on the Heinz marketing site, I came across the article, “The Longest Shortest Time”  there. The daily crises make a day long, but the days disappear rapidly in retrospect. The now, the immediacy is hyperbolic. The fist of a character in a cartoon is larger due to foreshortening. Everything unknown looks big when we don’t have any data. But, once we know, we look back. Everything is known in retrospect. Everything is small in retrospect. Everything was fast. That foreshortened view was fleeting. The underlying geometry shifted from hyperbolic to Euclidean as we amassed data and continues to shift until it is spherical. The options were less than one, then one, then many.

Value in the business sense is created through use. Value is projected through the application over time into the future from the past, from the moment of installation. That future might be long beyond the deinstall. The time between install and deinstall was long but gets compressed in retrospect. The value explodes across that time, the longest time. Then the value erodes.

In the even longer time all becomes, but a lesson, a memory, a future.

## Chasm Chatter

This week there were two tweets about how the Chasm doesn’t exist. My usual response to chasm mentions is just to remind people that today’s innovations are continuous, so they face no Chasm in the technology adoption lifecycle (TALC) sense. They may face scale chasm during upmarket or downmarket moves. But, there are no Chasms to be seen in the late phases of the TALC, the phases where we do business these days.

Moore’s TALC tells us about the birth and death of categories. Anything done with a product in an existing category is continuous. In this situation, the goal is to extend the life of the category by any means, innovation being just one of the many means. VCs don’t put much money here. VCs don’t provide much guidance here. And, VCs don’t put much time here either. The time to acquisition is shrinking. Time to acquisition is also known as the time to exit. In the early phases, all of that was different.

Category birth is about the innovator and those within three degrees of separation from the innovator. That three degrees of separation is the Chasm. It’s about personal selling. It’s not about mass markets. It’s about a subculture in the epistemic cultural sense. It’s a few people in the vertical, a subset of an eventual normal. It’s about a series of Poisson games. It’s about the carried content. The technology is underneath it all, but no argument is made for the technology. It isn’t mentioned. The technical enthusiasts in the vertical know the technology, but the technology explosion, the focus on carrier is in the future. It is at least two years away and as much time will pass as needed. But, the bowling alley means it is at least seven years away.

Then comes, the early mainstreet/IT horizontal. The tornado happens at the entrance. Much has to happen here, but this is a mass-market play.

After the horizontals, the premium on IPOs disappears. We enter the late phases of the TALC where innovation becomes continuous and no new categories are birthed. This is the place where people make errant Chasm crossing claims. This is where all the people claiming there is no Chasm have spent their careers, so no, they never saw a Chasm. They made some cash plays. They were serial innovators with a few months on each innovation, rather than ten years on one innovation that did cross the Chasm. Their IPOs didn’t make them millionaires because there is no premium. The TALC is converging to its right tail. The category is disappearing. They cheer the handheld device, a short-lived thing, and they cheer the cloud, another even shorter-lived thing, the end of the category where the once celebrated technology becomes admin-free magic.

So yes, there is no Chasm. But, my fear is that we will forget that there is a Chasm once we stop zero-summing the profits from globalism and have to start creating categories again to get people back to work. Then, we will see the Chasm again. It won’t be long before the Chasm is back.

Enjoy.

## Nominals II

March 15, 2018

I left a few points out of my last post, Nominals. In that post, the right-most distribution presents me with a line, rather than a point, when I looked for the inflection point between the concave-down and concave up sections of the curve on the right side of the normal distribution.

A few days after publishing that blog post, it struck me that the ambiguity of that line had a quick solution tied to the fact that the distance between the mean and that inflection point is one standard deviation. All I had to do was drop the mean from the local maximum at the peak of the nominal and then trisect the distance between that mean and the distribution’s point of convergence on the right side of that nominal’s normal distribution.

Backing out of that slightly, every curve has at least one local maxima and at least one local minima. A normal distribution is composed of two curves one to the right of the mean and another to the left. Each of those curves has a maxima and minima pair on each side of the mean. The maxima is shared by both sides of the mean. A normal that is not skewed is symmetric, so the inflection points are symmetric about the mean.

Starting with the nominals comprising the original distribution, I labeled the local maxima, the peaks, and local max minima, the points of convergence with the x-axis. Then, I eyeballed each line between the maxima and minima pairs to find the inflection point between each pair. Then, I drew a horizontal line to the inflection point on the other side of the normal. Notice the skewed normal is asymmetric, so the line joining the inflection points is not horizontal. Next, I drew a vertical line down from the maxima of the normal distribution on the right. Then, I divided the horizontal distance from the maxima to the minima on the right into three sigmas or standard deviations. The first standard deviation enabled us to disambiguate the inflection point on the right side of the distribution.

The standard normal is typically divided into six standard deviations–three to each side.

Here I’ve shown the original distribution with the rightmost nominal highlighted. The straight line on the right and the straight line on the left leaves us unable to determine where the inflection point should be. My guess was at point A. The curvature circles of the tails did not provide any clarity.

I used the division method that I learned from a book on nomography. I drew the line below the x-axis and laid out three unit measures. Then, I drew a line from the mean and the x-axis beyond the left side of the first unit measure. Next, I drew a line from the distribution’s point of convergence on the right side of the normal beyond the right side of the third unit measure. The two lines intersect at point 3. The rest of the lines are projected from point 3 through the line where we laid out the three unit measures. These lines will pass through the points defining the unit measures. These lines are projected t the x-axis.

Where the lines we drew intersect with the x-axis, we draw vertical lines. The vertical line through the mean or local maxima is the zeroth standard deviation. The next vertical line to the right of the mean is the first standard deviation. The standard deviation is the unit measure of the normal distribution. The vertical lines at the zeroth and first standard deviation define the width of the standard deviation. The vertical line demarking the first standard deviation crosses the curve of the normal distribution at the inflection point we were seeking. The point B is the inflection point. We found the standard deviation of the rightmost normal without doing the math.

I put a standard normal under the rightmost normal to give us a hint at how far our distribution is from the standard normal. At that height, our normal would have been narrower. The points of convergence of our normal limit the scaling of the standard normal. A larger standard deviation would have had tails outside our normal.

Here I’ve shown the six standard deviations of the standard normal. I also rescaled standard normals to show how a dataset with fewer data items would be taller and narrower, and how a dataset with more data items would be shorter and wider. The standard normal with fewer data elements could be scaled to better fit our normal distribution.

In the original post, I wondered what all the topological torii would have looked like. I answered that question with this diagram.

Enjoy.

## Nominals

February 25, 2018

A tweet sent me to “Mean, Median, and Skew: Correcting a Textbook Rule.” The textbook rules are about the mean being in the long tail and the mode being in the short tail. The author discussed exceptions to this rule. Figure three presented me with a distribution that the author claims to be a distribution that was an exception to the textbook rules. The author claims the distribution is a binomial. I annotated the figure. It’s definitely some kind of a nomial, but looking closer, it is not a binomial.

The nominal on the right side of the distribution shows us what we see if we look at the side of any normal. An aggregate curve comprised of a concave downward curve and a concave upward curve with an inflection point between them, a single inflection point between them.

The distribution on the left side is not the result of a single nominal. There are many inflection points. The left side of the distribution is concave down, concave up, concave down, and concave up. We can say the left tail is single tail comprised of two presented lines, or we can say they are the overlap of two different distributions. That second concave down hides a distribution inside the base distribution.

The distribution gets called a binomial because it has two prominent peaks. But the left peak is an aggregate of at least one more nomial. Otherwise, we would add another set of inflection points. When making an argument about where the mean, median, and mode are we have to consider each nomial to have its own triple. So there should be at least two triples, rather than one, as shown in the figure. I called the triple we were presented with an error, but it does present us with one of the exceptions the author wants to talk about. From this, we can take away the idea that these aggregate statistics hide more than they inform. I found myself in a Quora discussion on separating the underlying distributions of a binomial. There is math for that, math I do not know yet.

I am working on the assumption that all the underlying distributions are normal, a base assumption that is routinely made in statistics.

The graph hides much as well so I drew what I expected the distributions under the given “binomial” would be. I just eyeballed it.

I used arrows that match the color of the curve to show the concavity. Extra probability mass shows up at the intersections where distributions meet. I’ve labeled the probability mass at the intersections as gaps. Given the underlying distributions are only approximations, I didn’t make the green distribution, distribution 1, fit perfectly, so the thin layer of the second gap from the beginning lays on top of the distribution without involving a distribution. I used three different distributions to account for the tail convergence on the right. This gave rise to a gap. I didn’t catch this when I drew the figure. As I write this, there is no gap there. The red distribution accounts for that probability mass.

I went with a skewed distribution, distribution 1, to account for the second concave down section of the curve on the left side of the “second” nominal. A normal wouldn’t bulge outward under the exterior nominal, the black normal. A skewed normal has a long tail and a short tail. The intrinsic curvature of any long tail is low, so it has a large radius. The intrinsic curvature of any short tail is high giving us a small radius. The mean of this distribution is to the left. The median pushes the mean and mode apart symmetrically about the median. The median for distribution 1 leans to the right.

I went with three peaks on the left side of the “binomial.” I did this because distributions 2 and 4 have different heights. I know of no rules that would drive this decision. They could easily be one distribution.

The rest of our “binomial,” actually as demonstrated, it is a multinomial instead. We’ve ended up with five distributions so we would have five different triples of mean, median, and mode. These triples were aggregated in the author’s numeric results. We can take it that when the mean, median, and mode are the same, we have a standard normal. The textbook rules about the tails and their relationships to the mean and mode still stand. Otherwise, we have numbers generated from an aggregate normal.

Don’t just accept the “binomial” allegation. If the numbers don’t make sense, they don’t make sense. When numbers don’t make sense, you’ve got more sense to make.

As a product manager, I don’t want to aggregate and drive that into a product that fits no one.

I went on to play with the “binomial” distribution some more.

I started with vertical slices for the Riemann integral. I also did this to give me a hint towards the factors involved in each slice. Due to my use of raster graphics, some slice lines are thick, because the intersections of the distributions are not points. Some intersections are lines. The point intersections give rise to vertical lines. The line intersections give rise to rectangles. Each vertical slice in those rectangles can differ. They are not uniform. Individual slices would still look like a solid rectangle.

The vertical lines tell us that at that moment in time, our organization if we worked at the underlying granularity, would represent some management adjustment to serve the underlying populations appropriately. This both the gray and light blue lines or rectangles.

The blue lines show us where the associated distribution converges with the horizontal axis. That horizontal axis would move relative to any upmarket or downmarket moves the organization was undertaking over a period of time. I labeled these as ordering changes. But, the gray lines are ordering changes as well. Orderings come up when computing binomial probabilities and in game theory.

The pink area shows the expanse of a single factor mixture. Part of that area shows the factor associated with the black distribution quickly slowing down. I labeled that part of the black curve “Fast.” And, it shows the factor’s deceleration showing. That labeled “Slow.” Otherwise, this slice is relatively stable. Note growth is not a positive notion here. In fact, the late phases of the technology adoption lifecycle, the orthodox management phase is post growth and in decline–constant decline. The only options are to focus, an upmarket move, or to drop the price and move downmarket. Neither guarantee growth in themselves.

From the mean of distribution 5, the purple distribution, All factors are in decline. But in the pink area, the factors are organized by a single constant factor curve.

In Upton’s “Aesthetics of Play”, the pink zone is a single play space. In his book, rules generate spaces and those spaces dictate process and policy. The technology adoption lifecycle(TALC) is based on this idea, but it is based on populations organized by that population’s pragmatism. The business facing that play space or population must eliminate its process and policy impedances to succeed. Addressed impedances constitute your organization’s design.

These spaces make those nascent moments when we don’t have a normal part of the difficulty with bringing another discontinuous innovation to market while sitting in the space where the category the company is in is dying. The pink space is that end-of-life space. Notice how different the pink space is to any slice on the left side of the aggregate distribution.

Upmarket and downmarket moves move the feet of the distributions, the points of convergence with the horizontal. The new space might have additional intersections of the nominal distributions. Where this is the case, the factors for the new slices would change. This would repartition the existing populations as well. Where the nominals are normal, the additional populations gained by the move would not change the nominals other than at the feet. In upmarket moves, keep them large enough to maintain normality, or expect exposure to kurtosis risk.

In our diagrams, the red distribution seems high, which implies that it needs more density. The number of data points needs to be increased. This also implies that there should be some skew, but it is not apparent. As a distribution gains probability mass, it becomes lower and wider.

When looking for inflection points, those points can be lines. The nominal on the right exhibit that behavior. I went looking for what that means mathematically. The inflection point is ambiguous. I crossed paths with symplectic geometry. They deal with the same problem. The nice thing businesswise about this ambiguity is that it grants you some time to switch from growth to decline or from fast to slow. The underlying processes of the business need to change at all inflection points. The deal here between a point and a line is that a point is a sudden change requiring proactivity, and a line requires less proactivity.

Then, I wanted to see the toruses involved. So I started with the normal distribution on the right side of the “binomial.” I used the original distribution, not the teased out distribution, so the distribution on the left only exposed its left side. fitting a circle to the curve on the left was less clear.

Imagine if a tori pair was shown for each of the five distributions. Where a tori pair does not have the same radius in each constituent circle, there would be kurtosis, a pair of tails, and a median lean. The radii of the circles in that pair would change as the 2D slicings were rotated around the underlying distribution. The median lean results from the particular dimensions of the 2D slice. This generates some ambiguity in the peak, as the median for each slice would differ. By slicings, I mean taking slices around the circle giving us a collection of different slices. I do not mean rotating the same slice.

Where a tori pair had the same radius, the distribution has achieved normality. The kurtosis would be near zero, the median would no longer lean, and the mean, median, and mode would converge to the same value. The radii of the circles would not change as the 2D slicings were rotated.

Next, I took horizontal slices as in Lebesgue integrals.

As discussed in regards to the vertical slicing, the gray lines indicate point intersections. The thicker gray lines indicate line intersections.

Where the vertical slice figure showed gaps, those gaps are comprised of a collection of Poisson distributions and a single collective normal. Poisson distributions come to approximate the normal when it has 20 or more data points. The normal is achieved without approximation when 36 data points have been collected. Breaking a normal into subsets can give rise to Poisson distributions. So there is risk involved with these considerations. I highlighted these with yellow rectangles around the labels.

The skewed distribution, the green distribution, has been highlighted with the same yellow as the Poisson distributions because having not yet achieved normality, much will change and those changes will be rapid as normality is achieved.

The red arrows show the direction in which I expect the distribution to change. The left arrow associated with the skewed distribution is only considering the movement of the foot, everything will change with the skewed distribution. The base “binomial” will most likely change and give rise to an apparent 3rd nominal on the exterior of the aggregate distribution. The down arrows associated with the peaks can be expected to lose height or amplitude as more data is collected.

The median of the skew would become orthogonal. The change in its theta is not indicated on the diagram.

The intersections of the distributions will change, so they are highlighted in yellow as well.

The factor analyses also change when looked at from a horizontal slice point of view. You can consider the factors across a horizontal slicing to differ from the factors across a vertical slicing. There would be a collection of cubes if both slices where made. Those cubes would be N-dimensional, but given our slicings would be 2D, it would get messy. cubing based on a factor analysis would be easier to operationalize in the sense of organizational design.

I labeled the slices. I had intended to provide a factor analysis for each slice. If I had the underlying data that would have been possible, but a graphical approach proved frustrating.

Next, I generated the probability of a portion of the AI slice under distribution 5, the purple distribution. A Lebesgue integral would achieve the same result.

The blue rectangle represents the probability mass under the purple distribution between the vertical constraints of the gray lines delineating that dimension of the slice AI.

The author went on to give several examples of other aggregate distributions. He used these distributions to explore how the mean, median, and mode violate our expectations. So the textbook rules are violated by aggregates of underlying distributions, multiple distributions. This is true of the “binomial” example. As a rule, only consider those statistics to be valid at the level of the constituent nomials, rather than the aggregate nominal. Aggregate nominals frustrate the expected orderings of the statistical tuples.

I take it that the thick black line is the mode. On the left, we get the textbook ordering. Then, in the yellow rectangle to the right of 0.5, it changes to an exceptional ordering. At some point, it changes back to textbook ordering. And to the right of 0.75, the mean changes its tail association to being associated with the short tail. In the textbook ordering the mean is in the long tail. This is where using a single number for kurtosis does not make sense. It only made sense in the standard normal sense where the tails have identical values on both sides on the 2D slice involved.

The author went on to construct a distribution associated with the graph showing the tuple ordering exceptions. In a skewed normal, the median leans over to sit on top of the mode. This is the case in the aggregate distribution used here. The ordering is not exceptional, but the lean is not at the value of mode but along it. Where I annotated this as exceptional, the exception is the distance from the median to the mode. The ordering is not exceptional. It does, however, change the width of the separation between the median and the mode. The ordering is not symmetric around the median. The red lines are intended to show the median leaning on the mean so that the asymmetry relative to the mean, median, and mode is clear.

Then, I went on to explore the logic of the 2D slice. Here we are talking about the logic of the carried data, not the logic of the statistical carrier. The logic of the statistical carrier would be that of a normal distribution. With all the mathematical approximation formulas allowing us to convert from one distribution to another, we might ignore the logical constraints. I’m calling these distribution-to-distribution logical constraints the logic of the statistical carrier. The aggregation rules for a normal is an example of such carrier constraints. The carried logic is that of the collected data, rather than the collection and analysis of such data.

Logical consistency is tricky. Decades ago consistency was a true or false question. Was it consistent from the top to the bottom across every branch of the argument? These days that’s called absolute consistency. But now, we have relative consistency. It works from some absolute consistency to a branch of the argument that is consistent with itself and that base absolute consistency. Other branches would arise. Those branches would not demonstrate absolute consistency with other branches. This kind of consistency is relative consistency.

Statistically, the relative consistency would be a characteristic of each tail. Absolute consistency would be a characteristic of the core.

Relative consistency leaves us in a non-Euclidean space. That space typically would be hyperbolic involving manifolds, rather than functions. This calls into question the management practice of alignment and organizational structure.

In this figure, the logic of the tails is highlighted in pink. The question marks indicate where one would define shoulders, outliers, and distant outliers.  What are your definitions of those boundaries? This is a 2D slice. Another 2D slice through the mean might require different decisions. Another slice would have a different set of curves. One of the slices would appear to be a standard normal with equal tails on both sides of its mean.

Relative consistency would start at the shoulder of a particular tail. Where you don’t differentiate the shoulders from the tails, a relative consistency starts with a particular tail. Each tail would have its own logic.

The last figure demonstrates the slices concept. The red line is closer to a standard distribution and its tails. The blue slice is definitely skewed. The thin blue line in the core is there to hint at the lean involved in that 2D slice. The red slice does not exhibit any lean. As more data of the dimension underlying the blue baseline is collected, the lean will disappear as will the asymmetry of the tails.

As a manager, big data is great if you have large existing populations and large existing collections of relevant data. Continuous innovation thrives in this situation. But, do be cautious of Poisson scale subsets. And, be cautious of any distribution summed to the existing normals. That data might be Poisson. And, that distribution would be skewed and kurtotic bringing you their relevant risks. Discontinuous innovation is blank space inventions tied to an absence of any relevant populations. These innovations have tiny networks. Data collected from those networks will be small data, Poisson, pre-normal, and will move across the terrain. It will be a long time before it settles down, but at the same time, it is a long way from being a commodity, or something that orthodox management practice can handle. It is a long way from the spherical geometry of that orthodoxy. It is a long way from the Euclidean of LP2. It is hyperbolic. All that distance implies there is real economic wealth to be created, and there is plenty of time to capture it.

The data collection and relevant distributions will mature.

Snapshot statistics is not all that informative. What your distributions dynamically.

Enjoy.

## Box-Whisker Charts

February 12, 2018

Twitter presented me with this box-whisker chart about perceptions of probabilities. The probabilities run from the most certain to the least certain. All these probabilities could be summed into a single normal distribution. I tried to put the footprints of all the distributions into a single footprint. I don’t have the tools.

Most of the distributions are skewed, so they are ellipses.

Each of these distributions appears to be mutual exclusive.

I already knew boxplots. So I tried to grasp the shape of the distributions. I annotated the above figure as shown on the right. I hacked the notation. I’ll discuss it in detail later in this post.

There are normal (N) and skewed (SK) distributions. Each of these box charts has a pair of tails, but as I went along, I realized there are three pairs of tails. Each pair of tails consists of a long tail (L) and a short tail (S). Once I realized there were three tails, I used L1, L2, L3, S1, S2, and S3 to label the tails. After that, I found tail pairs that were missing a tail. I used 0 (zero) to annotate them. Later, I realized that they are really don’t cares. Their lengths are unknown.

The outliers are annotated with red “if”s. Including outliers or excluding them should be a matter of established policy. The costs of writing code for a transient population of outliers can be quite, and needlessly, expensive.

I read What a Boxplot Shape Reveals About a Statistical Data Set and found a surprise. Boxplots assume a monomial distribution. The article compares two distributions with the same boxplot. They use the histograms to illustrate this problem. I’ve added the red text and the data point counts.

The distributions shown do not have enough data points to use the Poisson distribution to estimate the normals. The distributions have not yet tended to the normal, so they are skewed. The box-whisker chart would tell us more about the skew.

As I wrote this post, I looked back at the article that contained the first graph in this post. The article contained two graphs of the actual distributions summarized in those first two box-whisker charts.

In this figure, I labeled the outliers, They appear as their own distribution. I’ve also labeled the nomials.

The same labels apply to this figure, the second figure illustrating the distributions.

The next thing to look at is the normals being added together to give us those multinomial and binomial distributions. I have edited the figure to the right. I used the tails that I could see to provide the missing tail, the tail under the adjacent normal. Once all the tails have been provided, there is left over probability mass that appears where the two normals intersect. I colored those blue and called them “mix” as this is where mixture effects occur.

Later in the upper part of the figure, I just used red Bezier curves to suggest normals. Initially, I understated the number of nominals involved. Then, I found more than one inflection point on a given tail. These bulge out at the side of the distribution. These bulges are caused by another normal inside or under the covering normal. These can oscillate in some situation. But, the peak of the normal under is never exposed so you wouldn’t call it a nominal.

The previous figure shows us what the statistical distributions associated with the technology adoption lifecycle (TALC) would look like. They would be a series of distributions. They would not be a single distribution that just grows. The previous figure as looks like the pragmatism slices that comprise the TALC. Each pragmatism slice would have its own distribution. These distributions would aggregate into the TALC phase distributions.

While I was researching this, I watched a video on calculating multinomial probabilities. I watched the subsequent videos on this topic. It struck me that given independent, mutually exclusive probabilities used in these calculations gives rise to a histogram, which in turn takes us back to the box-whisker chart and individual distributions. It also takes us back to the finite probabilities of the long tail of feature use. Once you have stable frequencies in your long tail, you would have a set of probabilities that add up to one. Changes to the UI would change the frequencies and subsequently change the probabilities.

The figure above does not give us any hints as to skew or kurtosis. The box-whisker chart can provide some information. In the earlier histograms, the one on the right shows that we have a binomial distribution. The data sets for those distributions have too few data points so those distributions would be skewed. The peaks are medians. Those medians lean. They are not perpendicular to the x-axis. The lean pushes the mean and mode apart. With a few statistics beyond what the box-whisker chart is telling you, you will be able to determine how many nomials are involved.

## Analysing a Box-Whisker Chart

Then, we examine the symmetries with two tests: the core test (A) and the tail test (B). Before confirming normality via the core test, the red line, the median, would be black.

To do the core test, we draw 45-degree lines from the cross both boxes from the shared location where the median intersects a rectangle containing the two squares as shown If the lines intersect the opposite corners the boxes are squares. This implies that the boxes are the same size and that the distribution represented by the box-whisker chart is symmetric in terms of the core of the distribution. If the diagonal lines intersect the sides at the same height, again, the distribution is symmetric.

Next, we do the tail test. We measure both tails to determine if they are equal, or shorter or longer lengths otherwise.

If the box-whisker chart passes both tests the distributed represented by the chart is symmetric, which in turn tells us that the distribution is normal (N). I annotate normal distributions with using a red capital N. I also show the median as a red line at an angle of 90-degrees. The median does not lean in symmetric normals.

I used tick marks to indicate that the whiskers are the same length as is done in geometry.

In this figure, we examine a box-whisker chart for a skewed normal. The boxes are not the same size. Doing the core test, we find that line for the left box intersects the box higher than the line for the right box. This demonstrates that the boxes are skewed. This was labeled with a red “SK.” Since we know the distribution is skewed, we can lean the median by taking the median’s cosign. This gives us the length of the median. As we angle this median, it will contact either the mode or the mean depending on which tail is long. Here we left the whiskers the same length. We labeled the long side (L) and the short side (S). I then drew the shape of the distribution in blue based on the information from the box-whisker chart alone.

Theta is the angle with which the median was leaned.

The mode and the mean are the same as the median in an unskewed normal. They separate symmetrically around the median in a skewed normal. They are shown, for this illustration only, as short, vertical black lines inside the box.

In box-whisker charts, the median is usually shown as a thicker line.

In this figure, we look for three pairs of symmetries. The distribution is normal, so the core and tail pairs in each pair are the same length. This will not necessarily be the case with skewed distribution.

I did not measure the outlier distances earlier. This is where that happens. if , where d is the distance function or metric, d(ab) = d(ac), d(ad) = d(ae), and d(af) = d(ag) then the distribution pairs are normal. Otherwise, the unequal pairs are skewed so they would have unequal core widths or tails.

Once we know what core widths or tails are long and which are short, we label them. Here the left core is narrower than the right core. All the other lengths are the same, but the asymmetry of the core makes all the tails on the left shorter in aggregate than those on the right. The summary notation of S and L were enough to convey all the relationships between the pairs of tails. The numbered notation gets more complicated later. Nothing guarantees a nice orderly set of relationships. Folding at the median will be informative in some cases.

In this figure, we get a messy ordering of the relationships. I’ve added some notation. Where you move from a short to long or long to short, the tails protrude. If everything on one side is short and everything on the other side is long protrusions are less likely. They are not impossible because of the relative nature of shorts and longs.

Swaps are fairly active things, so they constitute a sensitivity driving kurtosis risk.

I connected these swaps on one tail. The other tail is swapped as well. Here SW23 means there was a swap in the second pair of tails, and another swap in the third pair of tails. The cores are just the first pair of tails. SW23 just condenses SWAND SW3.

The next figure is a mess. Three measurements, members of each tail pair are missing. S1S2, and Sare missing. The thick line on the rectangle is the perpendicular median. The only whisker is to the right, so that is where the long tail goes, to the right. That means the median leans left. There are no outliers to the right, so S3 does not exist. I use zero to indicate non-existence.

Every outlier has been labeled with a red “if.” Every outlier causes us to consider whether to leave it in or take it out. The further away from the mean it is the more likely it will be eliminated. But standing business rules are better than ad-libbing here. Establish policies. Outliers are costly to serve.

In this figure, I have annotated the curvature of one of the pairs of tails. Given three pair of tails, there would be three toruses that could be generated by revolving the curvatures around the mean. A curvature is the reciprocal of a radius. This implies that high curvatures are tighter and smaller than low curvatures. The small orange circle has a tight curvature. The large orange circle has a looser curvature. A 2-D slice is shown. In n-D or 3-D, the two circles are part of the same torus revolving around the core of the normal. The surface would be smooth and continuous.

A tight curvature corresponds to a short tail as it is tangent to that tail. A loose curvature corresponds to a long tail as it is tangent to that long tail. As the distribution approaches normality, the curvatures equalize to some average curvature. The circles become the same size on both sides of the distribution in its 2-D view or slice. The curvatures of the standard normal are the same on both sides of the distribution.

Big data ultimately comes down to Markov chains that sequence individual distributions together. The original charts demonstrate how meaning is particular to place. Upton says as much in his The Aesthetic of Play, as did Moore’s Crossing of the Chasm.

Enjoy.

## Skewed Normal

January 28, 2018

As a baseline, we’ll start with a top-down view of a normal distribution. The typical view is a side view. In the top-down view, the normal is the center of some concentric circles. In our graph, the concentric circles will have radiuses defined in terms of the statistical unit of measure, standard deviations. I’ve shown circles at 0,1,2, 3, and 5 standard distributions. The mean is shown at 0 standard distributions.

The core of the distribution is shown in orange. The horizontal view of the distribution defines the core as being between the inflection points (IP) of the normal curve. The core in a normal is the cylinder from the plane of the inflection points to the base of the distribution. The horizontal view can be rotated to align with the plane cutting the distribution for a particular dimension shown here as D1, D2, and D3. We only have three dimensions in this normal. With n dimensions, there would be n slices. With the normal, as long as the distribution is sliced through the mean, all the 2D projections would look the same. The normal is a symmetric distribution.

With a normal distribution, the mean, median, and mode have the same value. This and being symmetric is a property of the normal. More specifically, a non-skewed normal. A standard normal is not skewed.

The normal can be estimated with a Poisson distribution of at least twenty data points. The Poisson distribution will tend to the normal between twenty and thirty-six data points.

The normal is usually used in the snapshot dataset perspective, rather than in a time series sense. But, the time series sense is significant when you wonder if you’ve collected enough data. A dataset should be tested for normality. It is usually assumed. The tests for normality are weak.

Once the data achieve normality, the data tends to stay normal. The core and the outliers won’t move, and the standard deviation will stay the same. Until the data achieve normality, the distribution moves and resizes itself.

In the technology adoption lifecycle, the vertical phase will be the first time the normal is achieved. It will be a normal for the carried content. The horizontal, aka the early mainstreet market, has its own normal. The horizontal’s normal is for carrier components. The late market, likewise, has its own normal. In the late market, the focus is on carrier with the earlier carried discipline representing mass customization opportunities. The laggard and phobic phases are about form factors, carriers. Carried content may change is these phases. Carried content provides an opportunity to extend category life.

Preceding normality, the normal is skewed. In the next figure, I’ve put the skewed normal above the non-skewed normal.

Where the normal has a circular footprint, the skewed normal has an elliptical footprint. The median does not move. It tilts. This pushes the mode and the mean apart symmetrically around the median. The blue arrow shows how much the median tilts. The thick blue line shows the side view of the skewed normal. The core is shown in light orange. The tails are significant in the skewed normal. The skewed normal is asymmetrical. More on this later.

Each ellipse corresponds to the sigmas of our earlier diagram. But, the circular areas are the future. I’ve marked the outliers relative to the circular footprint of the non-skewed normal. The area I’m calling the deep outlier, the dark yellow population, is beyond what would be considered in the non-skewed normal. It would definitely be an error to collect data from that population, or since we sell to populations as we collect data from that population, it would be an error to sell to that segment of the population. Even after normality is achieved, outliers are more expensive than the revenues generated from that population.

The yellow populations are outliers, but they are outliers to the non-skewed normal. These outliers are shared by both distributions. The light green and even lighter green areas represent non-outlier populations that will be sold in the later normal, or as we sell to achieve normality. As the skewed normal achieves non-skewed normality, the ellipses will become circles. The edges located along the x-axis will move to the right. The tilted median will stand up vertically until it is perpendicular, and the mode and mean will converge to the median.

The ellipses would be thinner than shown. The probability mass under both distributions equals one so the ellipse would be less wide vertically than the circles. I had no idea about how wide those ellipses would be, but the figure is definitely wrong.

The skewed distribution exhibits kurtosis. I disagree with the idea that kurtosis has anything to do with peakedness. Other statisticians made this argument to me. The calculus view of the third moment disagrees as well. Kurtosis is about the tails and the shoulders as they relate to the cores. Some discussions ignore the shoulders. In this figure, I’ve included shoulders. I’ve used thick red lines and red text to highlight the components of the normal (N) and the skewed normal (SN). The normal only has one set of components. The skewed normal has two sets of components: one on the left, and another on the right.

I highlighted the shoulder of the normal. I highlighted the right and left shoulders of the skewed normal. And, lastly, I highlighted the right and left tails of the skewed normals.

The shoulders and tails are related to the cores. The normal core is a circle. The light orange ellipse of the skewed normal sits on top of it. I labeled both cores. The purple rectangle above the cores is the core of the skewed normal. The black core is the core of the non-skewed normal.

Kurtosis defines the curvature (κ) of the tails. I usually show these as circles defined as  κ=1/r. These circles are tangents to the tails of the normal. In a normal, these circles are the same size on for both tails. In a skewed normal the circles are vastly different in size. These circles in both cases generate a topological object: A torus for the normal, and a ring cyclide for the skewed normal. These topological objects are generated as we rotate 360 degrees around the median or mean of the normal. I showed this topological object in dark orange. In this figure, I showed them as ellipses. The circular version made the diagram very large. The ellipse for the ring cyclide on the left side is large. On the right, it is very small. This is due to the horizontal slice through the 3D objects. The xy-plane used to produce the slice through both objects. Both objects are smooth and continuous so another slice through the median would show a smaller circle on the left and a larger circle on the right. At some rotational angle, both circles would be the same, as in both curvatures would be equal. The thick vertical line through the median turns out to be the slice in which both curvatures would be the same. This curvature would be the average curvature.

When I put the left portion of the torus in the figure, the blue line representing the side-view of the normal was incorrectly drawn. The peak should have been at the mode. This was the second surprise. The median has more frequency, but it is tilted at an angle, an angle that makes it less high than the mode. The mode being the highest was one of those not yet know pieces of knowledge.

I’ll attempt a multimodal normal with opposing long tails. I was going to try to illustrate a such a normal. There can be a multiplicity of centrality tuples, skews and long tails. With the tools I used now, that would be a challenge.

I’m looking at the Cauchy distribution now. There is no convergence. But, Cauchy sequences converge based on ε. You can pick your convergences. A footprint would be zeros. Different values of ε would different footprints, and different conclusions of the underlying logical argument in the triangle model sense of the width and depth of a conclusion.

The first thing that surprised me in this post was how a portion of the outliers, the deep outliers, of the skewed normal is too far away from my market. And, how other portions of the outliers are outliers in both distributions. Another example of writing to think, rather than writing to communicate. Sorry about that.

Care must be taken to ensure this if you are going to market to outliers. I won’t.

Enjoy.

## Followup on The Dance of a Normal

January 15, 2018

When I wrote the post, The Dance of a Normal: Data Quantity and Dimensionality I didn’t tie it back into product management. My bad. I’ll do that here. I was reminded that I needed to get that done by John D. Cook’s post on his blog post, “Big data is not enough.”

When we construct a normal from scratch, we need 20 data points before we can estimate the normal, and then make any inferences with that normal. That’s 20 data points of the same measurement in the same dimension. If that measurement involves data fusion and we change that fusion, we have a different measurement, so we need to segregate the data. If we change a policy or procedure, those changes will change our numbers, or basis. If we change our pragmatism slice, those numbers will change. If we had enough data, each of those change would be an alternative hypothesis of its own. Hopefully, they would intersect each other so we could test each of those hypotheses for correlation.  But, we can’t just aggregate them and expect to make valid conclusions even if we now have 80 data points and a normal.

With those 20 data points, we have a histogram. We will also have kurtosis when we tell our tools to generate a normal with those 20 data points. We will have to check to see how many nomials we have. Each nominal will have a mean, median, and mode of its own. Those medians lean. Those medians remain the statistic of centrality while the mode and mean move out into the skew.

While you can estimate a normal from 20 data points, don’t expect it to be the answer. There is more work to be done. There is more logic involved. There is more Agile development to do. Don’t move on to the next thing until you have 36 data point for that dimension. If you release some new code, start the data point count over. This implies slack.

When I was managing projects, the mean would converge. When you see the same mean several days in a row, you’ve converged. Throw the data our and collect new data. Once the data converges, it is hard to move the number. Your performance might have changed, but the number hasn’t. Things hide in averages.

Beware of dimensions. A unit of measure could be more than one dimensional when it’s used in different measurements. What is the logic of this sensor versus another? What is the logic of the illuminator? What is the logic of the mathematics? Are we assuming things? A change in any of that brings us to a new dimension. Write down the definition of each dimension.

The statistics for each dimension and each measurement takes time to reach validity. The rush to production, to release, to iteration leaves us with much invalidity until we reach validity. The numbers your analytics kick out won’t clue you in. Kurtosis can give you a hint if it is not swamped. Slow down.

Once you have achieved normality with a measurement, how many sigmas do you have: 1, 3, 6, >6, 60? At three, your underlying geometry changes from Euclidean to spherical. Your business will change when your sigma is greater than six. You will have more competition and the number of fast followers will explode.

Adding data points will change the normal, which in turn changes the outliers. This will be even more the case when you attend to the changes to your dimensions and measures, and your TALC phases and pragmatism slices. The carried and carrier will have their own dimensions and measures. They will also have different priorities, and levels of effort. When moving from a carried layer to a carrier layer, the outliers would be different, because the carrier and carrier have their own normal distributions each with their own dimensions and measures.  The emphasis changes, so the statistics change. The populations across the stack differ widely.

So much mess can be made with metrics. Gaps in the data happen. The past hangs around to assert itself in the future. When you drive down a road, adjacent houses can be from different decades. Data is likewise. The infrastructure helps eliminate gaps and the miss-allocation of data. It’s not as simple as a measure to manage, you have to manage to measure.

Enjoy

## Burst-and-Coast Swiming

January 13, 2018

“In contrast with previous experimental works, we find that both attraction and alignment behaviors control the reaction of fish to a neighbor. We then exploit these results to build a model of spontaneous burst-and-coast swimming and interactions of fish, with all parameters being estimated or directly measured from experiments.”

And,

“We disentangle, quantify, and model the interactions involved in the control and coordination of burst-and-coast swimming in the fish Hemigrammus rhodostomus. We find that the interactions of fish with the arena-wall result in avoidance behavior and those with a neighbor result in a combination of attraction and alignment behaviors whose effects depend on the distance and the relative position and orientation to that neighbor. Then we show that a model entirely based on experimental data confirms that the combination of these individual-level interactions quantitatively reproduces the dynamics of swimming of a single fish, the coordinated motion in groups of two fish, and the consequences of interactions on their spatial distribution.”

So what does this have to do with product management? It boils down to the technology adoption lifecycle (TALC). It basically describes organizing behavior, the organizing behavior of clients, customers, users, and markets.

Burst-and-coast swimming happens in the buy. An initial sale is a big effort on the part of the buyer and seller. Back when we sold software, the initial sale generated a large commission for the sales rep, and the subsequent upgrade sales generated smaller commissions–burst-and-coast. Then, came the install, burst-and-coast.  Once we get our own software installed, we use it and hope with never have to deal with the guts of the application ever again. Well, that tells you, I’m not a hacker or a developer. If the effort is too high, I bail. Sorry, my bad.

Attraction and alignment behaviors control the reaction of a vendor to neighboring vendors. And, the customers do likewise. Once you get that first B2B early adopter in the first pragmatism slice, the client, and start selling to the first degree of separation prospects in the adjacent pragmatism slice, you see this burst-and-coast behavior. In the market, the followers follow the leader, the peak predator. The vendor and the vendor value-chain members do the same. Even fast followers don’t get ahead of the leader. The leader pays a price to lead. They own the burst. The fast follower doesn’t have the capabilities it needs to be the leader. The pragmatism slices are the arena walls. Address only the next pragmatism slice, not the current one, or the past ones. The pragmatism slices are not random.

For each adjacent slice and subsequent adjacencies, the business model must convince, so even the arguments, the explanations burst-and-coast.   During that initial client engagement we build the client’s product visualization with our underlying technology, then in preparation for the chasm crossing, we build the first business case, and we have to help ensure the client achieves that business case. The chasm crossing is one of those arena walls sitting between two pragmatism slices that is problematic enough to warrant elevation to a TALC phase. We have plenty of time to ensure the business case. We are constantly addressing that business case.  “Software By Numbers,” describes how the client engagement should proceed. With each release, we have to convince the funding early-adopter client to fund the next release. Each release has to make the case for the next release. We cannot deliver the whole thing all at once. With Agile, we make the case at the level of each feature making its own case. So we have a school of code that heads off in a single direction effortlessly or a school of developers or a school of vendors and value chains. But, the convincing case for the client is not persuasion. It’s an obvious pathway in the functionality that takes the client to their value proposition by enhancing their competitive position, their place in the larger school of fish.

In the B2B early adopter phase, we are focused on the carried content. We have to build the carrier functionality at the same time, but that is in the background. The first business case is specific to the client and their role in the industrial classification tree, their specific subtree, no higher, lower, or wider. Care must be taken to keep the subtree small in the beginning. Stage-gate on the subtree. How much is enough? The big picture is hyperbolically away, and looking deceptively, and unsustainably small. You’re ten years away, this is no overnight unicorn. But, back to the fish, when you deliver to two channels: the carrier, and the carried, you have two schools of fish. They don’t swim together. There are arena walls.

In the next phase, the vertical market, there are more fish and more pragmatism slices. Sales will be random. Sales will ignore those arena walls. They are chasing money, not product evolution, subtree focus, or the future. Yes, we must pay for today and tomorrow, but outliers are costly and not an opportunity to swamp all the other considerations. Having one big customer is a bad, but an attractive looking proposition. The vertical itself is an opportunity. Plenty of companies started, did business, and exited within this phase never moving to the next phase. Companies lived comfortably in the vertical. It’s their school of fish. They go together.

Preparing for the next phase, the horizontal, or the IT horizontal requires us to shift from carrier to carried. Preparation should have started years or months ago. Worse, this phase will be about aggregating all the companies written on top of the same underlying technology. Vertical products will be rewritten as templates on a single carrier. In the vertical, the carrier should have stayed unified and similar. It’s a different architecture. The school of fish the customers will be much larger and wider, but the customer is now the IT department. The previous customer was the non-IT, business unit executive.

Yes, this is not what we do today. These days everything is continuous, small, short-term financed, exiting soon, not changing the world stuff. But, it is also, not what they did yesterday either. I’m writing from the full-TALC course, not entering late-market only, aka starting in the middle, or near the end. The problems we face today from globalism won’t be solved with the innovations we do these days, the continuous stuff; the science and engineering-free, management only innovation. So start something that will last 50 plus years that starts at the beginning and exploits every phase of the TALC until it ends up in the cloud.

Discontinuous innovations give rise to entirely new value chains, new careers, unimagined futures, and unaddressed sociological problems that we have not addressed during the youth of the software age. We are older now. We are more orthodox now. Yes, that business orthodoxy is a school of fish. We used to swim outside those fish, but they have joined us because our venture funding still works, while there, the banks, don’t make loans like they used to. The school of fish that are banks has moved on, so the orthodoxy sees the innovators as prey, and we apparently agree. We have not pushed back and said, hey this donut shop is not an innovation. But, we are being taught a buzzworded definition of innovation.

Anyway, you grasp what I’m calling the burst and coast, the never before though and the commodity, the innovator and the orthodoxy. Many fish of many species all self-organizing and structured in difficult ways to see. We will meet many along the way.

And, one last thing, read widely. The last thing that will teach us anything these days is the innovation press. Always ask yourself what can this teach me regardless of what you’re focused on these days.

Enjoy.

## The Dance of a Normal: Data Quantity and Dimensionality

January 10, 2018

Last night I read a post on John Cook’s blog, “Formal methods let you explore the corners.” In it, he mentioned how in a sphere with high dimensionality, most of the mass is in the corners. He put a circle in a box to illustrate his point.

Last week there was a figure illustrating this. As the sphere gained additional dimensions, it becomes more cube-like.  Given a normal distribution looks like a circle when viewed from the top down, I drew what a high dimension normal would look like as it moved from n=1, to n>36, and dimensionality moved from 0 to high dimensionality. Of course, I threw in a few other concepts, so the figure moved from this goal to more. That’s drawing to think.

## Low and High Dimension Normals

The figure on the left provides a top-down view of the normal, as a circle, as n increases, and dimensionality increases.

At n=1, the distribution starts off with a Dirac function exploding off to infinity. The height of the line representing the Dirac function contains the entire probability mass of the distribution. Given the usual z-score representation, a line by itself can’t be a probability because we need an interval. The second data point will show up quick enough to put an end to that quandary. Exhale. The point representing the line of the Dirac function is in the center of the concentric circles under the wider purple line.

A few more data points arrive. At n=5, we have a cluster around the black point in the center of the concentric circles. Here some of the probability mass has flowed down to the x-axis and outward into the distribution. That distribution is not normal yet. These data points would present a Poisson distribution or a set of histogram bars on a line. Here that line would be a curve. But, the data points would be curved. This cluster is shown with darker gray.

At n=20, the Poisson distribution would tend to the normal. This normal is comprised of a core and tails. These are concentric circles centered at the mean, the black point in the center, the point representing the line of the former Dirac function. The core is shown as a lighter gray circle; the tail, the lightest gray. As the number of data points increases, the width of the core and the tails grow.

As the number of data points grows, the normal distribution loses height and the probability mass that comprised that height moves outward in the core and the tails. Black arrows to the right of the mean show this outward movement. The circles representing the core and the tails get wider. Once, there are 36 data points, the width, and height of the normal stabilize. As more data points are added, Not much will change.

All of these changes in width and height were relative to a low number of dimensions. When you have less than 36 data points, the distribution would be skewed. This is ignored in the above figure. But, each dimension would be skewed initially and become normal as data points for that dimension are added. This figure is drawn from the perspective of a normal with more than 36 data points, hence no skew. Skew would appear in a top-down view as an ellipse.

Consider each dimension as having its own normal. Those normals are added together as we go. I do not know where the threshold between low dimension and high dimension normals would be. The high dimension normal footprint is shown as a rounded off square, or a squared off circle. It is pink. The corners get sharper as the number of dimensions increase. A black, double-sided arrow indicates the boundary between low dimension and high dimension footprints.

I used a light blue circle to demonstrate how the density in the high dimensional normal is not even. When a tail ends up in the corner, it is longer and the circle tangent to the normal curve is bigger. When a tail ends up on the side, it is shorter and the circle tangent to the normal curve is smaller. These black circles and ellipses represent intrinsic curvatures, or kurtosis, each given by the inverse of their radius, of the tails.

The normal we are used to viewing is a two-dimensional slice through the mean, so we have two tails. In a three-dimensional normal, we can rotate the slicing plane through the mean and get another two tails. With the standard normal, all the slices would look the same. The tails would be the same. The circles representing the intrinsic curvatures would be the same. But, when the normal is skewed, the slices would differ.  The tails would differ with one side being longer than the other. The circles representing the intrinsic curvatures would differ as well. The shorter tail would give us a smaller, tighter circle. The longer tail would give us a larger, looser circle.

If we rotated our slicing around the normal through the mean, in the high dimensionality situation, we would see the tails being the same on both sides, but each slice would have tails of different length. In the low dimensionality situation, the tails would be the same all in all slices.

The intrinsic curvatures are shown in black on the left side of the normal. I’ve put red spheres in inside those curvatures to hint at the topological object, the aggregate of those spheres, shown with the thick red lines, laying on top of the normal.

The pink footprint meets the light blue circle at the rounded off corners of the footprint of the high dimensional normal but diverges at the sides. There is no probability mass at the sides as it flowed into or was pushed into the distribution envelope suggesting higher densities inside the distribution along the sides. The light blue arrows indicate this.

The corners have the longest tails. The sides have the shortest tails. Given that the slices made by the planes slicing the distribution through the mean are symmetric, the tails are the same on both sides of the mean.

## Black Swans and Flow of Probability Density

In the center figure, I showed the usual side view of the normal. I drew two pink lines to show where the high dimensional footprint ended. The high dimensional footprint has less width except at the corners, so those pink lines, so rotating the high dimensional normal relative to the low dimensional normal would move those pink lines. This reflects a risk mechanism similar to skew risk and kurtosis risk.

I projected those pink lines. Superimposing the low and high dimensional normals presents us with two black swans if we go with the x-axis, or two shorter tails if we go with the x’-axis.  The two black swans appear as cliffs, the horizontal lines between the x and the x’ axes. The length of those lines represents the thickness of the bit loss. The tail volumes were lost between the pink lines and the outer gray lines labeled Prior Tail. The blue rectangles beside the distribution indicate where the tail volumes were lost. Where tail volumes were lost, we renormalize the distribution. In a high dimensional normal, these volumes would be small. These volumes contain probability masses in low dimensional normals. In high dimensional normals, all the probability densities are on the surface of the distribution.

In the center figure, I’ve used thin, light blue arrow to clarify the flow of probability density from the Dirac function into the normal.

## Intrinsic Curvature

The figure on the right illustrates with a side view of the normal: the effects of skew, and the presence of a torus or, more accurately, a ring cyclied. I first discussed this ring cyclied in The Curvature Donut.

The purple line in the figure on the left represents the x-axis of the horizontal view of the normal. On the figure on the right, the purple line is the x-axis. I used a standard normal but added the circles representing the intrinsic curvatures in red. Since the standard normal is symmetric, both of the outer intrinsic curvatures are the same size. This symmetric situation gives us a torus topologically. The torus sits flatly on top of the tails of the normal. This is the high dimension and the high number of data point cases. Then, I hinted at a skewed distribution, aka the low number of data point cases, with the angled line of the median and a short tail. That short tail would have a smaller circle representing its intrinsic curvature. This gives us a ring cyclied topologically. The ring cyclied sits tilted on the tail of the normal.

I then superimposed the smaller circle and larger one from the skewed situation. The smaller circle one represents maximal curvature; the larger circle, the minimal curvature.  Then, with the black circle, I averaged the two. So I could get down to one kurtosis value. Kurtosis is one number. You might tell me that it represents the kurtosis of the standard normal, but skew is tied to kurtosis, so there should be two numbers since their number would not be equal, but this average business is just my guess. I still don’t see the height of the distribution a being indicated by kurtosis. Still wondering.

Kurtosis as peakedness is stated all over the literature as the ground truth, but a few authors and I say that that doesn’t make sense. The third-moment, the calculus definition of kurtosis has nothing to do with peakedness.

The x and x’ axes here is accidental. The black swans show up as well. Again accidental.

I left a comment about multidimensional normals in John Cook’s blog. He replied while I was writing this. I will have to think about it a while, and I may have to revise this. See his Willie Sutton and multivariate normal distribution.

As always enjoy.

## Pragmatism

December 10, 2017

Pragmatism organizes the technology adoption lifecycle (TALC). While the TALC is usually represented by the normal distribution summed into the normal we use to summarize what’s going on. We see the phases, the larger scale pragmatism outcomes. Not the smaller scale pragmatism outcomes within the phases, the pragmatism slices.

To begin in the beginning, we illuminate when we don’t have a sensor that can detect a signal. Otherwise, we go straight to the sensor, which gives us data in some range. We might have to clean it up. For a normal distribution or a Poisson distribution, we count up how often a value occurred, or the arrivals of values.

Eventually, we end up with a distribution or an envelope for randomness. That distribution houses the “noise.” We captured the data points. We summarize the data points into parameters that determine the shape of the distribution we are using to summarize our data. We make a standard normal with just two parameters: the mean, and the standard deviation. With three pairs of numbers, we have the three normals of the TALC covered.

The TALC is a system built on noise. Yes, sorry, but sales is a random process. Marketing likes to think of itself as a methodical organization. Marketing discovers prospects, nurtures prospects, uncovers the buying process and the participants in the buy, and once the nurturing process moves all of those participants into the “I want this,” state, they set the appointment for the sales rep. Then, sales throws that lead in the trash.

While marketing was busy with all of that, sales picked up the phone and random walked themselves to revenue. And, finally, having sold, management tells the sales rep that they can’t do the deal because the prospect is an outlier. Just another day in the war between marketing and sales.

The TALC is anything but random. The TALC is a highly organized stochastic system. It’s like a radar. A radar sends out noise in a given distribution, a physical one. Only the frequencies that fit in the pipe make it to the antenna where they are transmitted. Then, they bounce off stuff and get back to the antenna where they again have to fit in the pipe. Outliers are trashed. In a company, that outlier prospect moves the population mean too far at too high a cost, so the company refuses to sell to them right now. A few years from now that too far at too high a cost problem was so yesterday.

Marketing already knew that. But, marketing is not random. Marketing has to be pragmatic when it faces a population organized by pragmatism. All that population wants is a business case that makes the buy reasonable. Reasonable is the real organizer. Jones bought this and got a hell of a success from it. But, you know us, we are not like Jones at all. Jones is an early adopter. We wait, not long, but we wait. We want to see the successes of businesses like ours. Jones is too early for our tastes. Just like sales is too early.

That the TALC is based on a summed set of normal distributions doesn’t help either. Those normals make this a stochastic system. The prospects do a random walk towards up. And, we do a random walk to out qualified prospects. “Qualified” filters those prospects. But, so does pragmatism.

I read across the “Markov Chains: Why Walk When You Can Flow?” blog post on the  blog. Twitter random walks all of us. This post is about random walks.

The author started with an application demonstrating a random walk under a normal distribution. He shows the next attempted step in the random walk with a vector that is either red for failure or green for success. When the vector is green, the next step is taken, which results in a new data point being added to the distribution. When the vector is red, the data point is not added to the distribution, and another step is attempted.

I annotated the author’s figure to show where the outliers sit, the Markov chain underlying the Metropolis-Hastings random walk and the TALC phases.

On the y-axis normal, I indicated where the data generated by the random walk are either over or under the expected frequencies. Then, I added a hypothetical path via the green vectors. I colored the outliers in gold, but later I realized that there were more outliers beyond the six sigmas of the normal representing the talk. I used the red circle to divide the additional outliers from the non-outlier tail of the normal.

Then, I labeled the TALC. That labeling might be unfamiliar. From the left, EA is the early adopter; C is the Chasm; V is A vertical market. The bowling alley (BA) is comprised of the early adopter and their vertical. The Chasm guards entry into the vertical. The technical enthusiasts are present across the TALC, not just at the beginning, so they have their layer. Their layer included the cloud form-factor (C) as part of the technical enthusiast layer. This population was formerly considered to be phobics (P) or non-adopters, but the disappearance of the technology and admin-free/infrastructural, aka somebody else’s problem presentation fits the needs of phobics. Then starting at the right again after the vertical phase, at the tornado (T), enter the early mainstreet (EM), otherwise called the horizontal (H) or IT horizontal phase. Next, we enter the late mainstreet (LM), otherwise called the consumer phase. We exit the late mainstreet one of three ways: the M&Athrough a second tornado (T), or by moving through or to the form factors of the device (D) phase, and the cloud (C) phase. NA here means non-adopter.

We may extend the life of the category by going down market. The gray outermost circle represents the extent of the down market move.  This is where Christensen disruptions live, in the down market. They live elsewhere as well, but all of them are firmly anchored in the late mainstreet or consumer phase. Foster disruptions require discontinuous invention and innovation prior to the technical enthusiast phase.

I further illustrated progress through the TALC with thick red and blue arrows. Discontinuous innovations need the full pathway starting with the technical enthusiasts (TE) phase. Continuous innovation can start anywhere. These days it is typical to be in late mainstreet (LM) leaving a lot of money on the table, but the VCs investing there only know that phase, so they do not reap the returns that paid for everyone else. Cash is the game in the late mainstreet. B-schools preach the late mainstreet with its steady long-term commodities and the sport of competition.

The extent of the downmarket is shown with the light blue horizontal lines and the angled line that denotes the end of the category. The line the company going downmarket ends up on depends on how far downmarket they went. The end of the category depends on the extent of the downmarket move as well.

The author talks about the efficiency of the next step in the Markov path and how one explores only the areas under the normal that need to be explored. So his next figure takes a random walk around a narrow ring under the normal.

In this figure, you see one phase of the TALC being rotated around under the normal. This would be the technical enthusiasts in their phase and the phobic or cloud phase. We find the next data point less often, less frequently, but the frequency of a given data point would be the same if a normal was used, but the overall process is faster when the area being explored is smaller.

So the math works out t be A=π(R^2-r^2) vs A=πR^2, which means that the ring does not take as long to compute. But, in a stochastic system, the random number generator knows nothing of rings, so many numbers get generated and disposed of unused.  Smaller targets are harder to hit.
I annotated this one as well. There is a lot going on in that ring.
The normal distribution in the ring is a circular normal. With a non-circular normal, the normal would be skewed until the density was consistent throughout the ring. That the distribution is not normal across the entire topology leaves us with skew and kurtosis. For the time being the distribution is trinominal. And, those uninominal are interspersed with Poisson arrivals that eventually tend to and achieve a normal. Those Poisson distributions occur in the still empty areas of the ring.
Again, I’ve color coded the areas under the distribution used as being over and under the frequencies intended by the distribution being used, the target distribution.
This figure shows us what a pragmatism slice looks like. But, in the TALC, we haven’t gone far enough in defining the target area yet.
Here I went back to the TALC and focused on the technical enthusiast in the beginning of the TALC (TE) and those last two phases beyond the Late Mainstreet (LM), as in phobic (P) and laggard (L) or the device (D) and cloud (C) phases. There are real differences in mission between the early and late phases. There are real differences between outcomes, as an IPO premium for early phases, and no such thing for late phases. The early TEs play with the technology. The late TEs migrate the product to the new form factors. The late TEs might have to develop a product for the company that eventually acquires the TE’s company. Macromedia developed Captiva to this end. So these different times are looking for very different target populations.
Each pragmatism ring serves different roles in the software as media model. Early TEs play with the carrier. Late TEs play with different form factors, different carriers. Late TEs also distribute components differently as well. Each phase has different expectations and different levels of task sublimation. Task sublimation would be counter to the need of those in the early phases, yet essential to those in the late phases. The generic “task sublimation is good” finding is not so good as a generic piece of advice. Likewise design, or the notion that dot 1.0 functionality was awful. No, it wasn’t awful. It served geeks just fine. We didn’t ask developers to respect the carried domain, and really, we still don’t. Observation and asking questions is insufficient for what needs to be achieved.
The functionality problem is still with us, unsolved. Hiring UX developers still leave the non-UX developers to code their functionality as they please. They still don’t do UX.
Those Poisson games played during the search for the next technology, those Poisson distributions show up throughout the TALC any time when we don’t have a valid sample or a normal free of skew and kurtosis.
Attend to your pragmatism slices. Don’t jump ahead then jump back. You moved your normal. They don’t go back well. Ask the next slice, the prospect slice, about what they need. Do this independent of your install base, your customers. Those prospects will need something different from your customers. You might as well have different lists for each of those slices. The carrier and the carried slices would be different as well. The carried and carrier code really can’t be written by the same developers. The disciplines being coded are too different. The carrier is easier than the carried. In general, we mess carried up. We know carrier. That’s where most developers live.
Way back in the nascent internet days, a developer was all hot to write an electronic store, but when I asked him if he had ever worked in a store, he said no. He was enamored with the carrier and thought the carried would be easy. Sorry, but stores have managers that live stores. A database developer lives databases. A database can be a nice metaphor for a store, but that poetry, not a store.
Enjoy your pragmatism slices. Don’t turn them into onions.

## From Time Series to Machine Learning

December 4, 2017

This post, “Notes and Thoughts on Clustering,” on the Ayasdi blog brought me back to some reading I had done a few weeks ago about clustering. It was my kind of thing. I took a time series view of the process. Another post on the same blog, “The Trust Challenge–Why Explainable AI is NOT Enough,” boils down to knowing why the machine learning application leaned what it did, and where it went wrong. Or, to make it simpler, why did the weights change. Those weights change over time, hence the involvement of time series. Clustering changes, likewise, in various ways as n, n as time, changes, again time series is involved.

Time is what blew those supposed random mortgage packages up. The mortgages were temporally tied linked, not random. That was the problem.

In old 80’s style expert systems, the heuristics were mathematics, so for most of us the rules, the knowledge was not transparent to the users. When you built one, you could test it and read it. It couldn’t explain itself, but you could or someone could. This situation fit rules 34006 and 32,***. This is what we cannot do today. The learning is statistical, but not so transparent, not even to itself. ML cannot explain why it learned what it did. So now there is an effort to get ML to explain itself.

Lately, I’ve been looking at time series in ordinary statistics. When you have less than 36 data points the normal is a bad representation. The standard deviations expand and contract depending on where the next data point is. And, the same data point moves the mean. Then, there is skew and kurtosis. In finance class, there is skew risk and kurtosis risk. I don’t see statistics as necessarily a snapshot thing, only done once you have a mass of data. Acquiring a customer happens one customer at a time in the early days of a discontinuous innovation in the bowling alley. We just didn’t have the computing power in the past to animate distributions over time or by each data point. We were asked to shift to the Poisson distribution until we were normal. That works very well because the underlying geometry is hyperbolic explaining why investors won’t put money on those innovations. The projects into the future get smaller and smaller the further out you go. The geometry hides the win.

It turns out there is much to see. See the “Moving Mean” section in the “Normals” post for a normal shifting from n=1 to n=4. Much changes from one data point to the next.

I haven’t demonstrated how clustering changes from one data point to the next. I’ll do that now.

At n=1, we have the first data point, DP1. DP1 is the first center of the first cluster, C1. The radius would be the default radius before any iterating that radius to some eventual diameter. It might be that the radius is close to the data point or at r=1.

At the next data point, DP2, it could have the same value as DP1. If so, the cluster will not move. It will remain stationary. The density of the cluster would go up. But, the standard deviation would be undefined.

Or, DP2 would be different from DP1 so the cluster will move and the radius might change. A cluster can handily contain three data points. Don’t expect to have more than one cluster with less than four data points.

At n=2, both data points would be in the first cluster. Both could be on the perimeter of the circle. The initial radius would be used before that radius would be iterated. With two points, the data points might sit on the circle at the widest width, which implies that they sit on a line acting as the diameter of the circle, or they could be closer together closer to the poles of the circle or sphere. C2 would be a calculated point, CP2 between the two data points, DP1 and DP2. The center of the cluster moves from C1 to C2, also labeled as moving from DP1 to CP2. The radius did not change. Both data points are on a diameter of the circle, which means they are as far apart as possible.

The first cluster, CL1, is erased. The purple arrow indicates the succession of clusters, from cluster CL1 centered at C1 to cluster CL2 centered at C2.

P1 is the perimeter of cluster CL1. P2 is the perimeter of cluster CL2. It takes a radius and a center to define a cluster. I’ve indicted a hierarchy, a data fusion, with a tree defining each cluster.

With two data points the center, C2 and CP2, would be at the intersection of the lines representing the means of the relevant dimensions. And, there would be a standard deviation for each dimension in the cluster.

New data points inside the cluster can be ignored. The center and radius of the cluster do not need to change to accommodate these subsequent data points. The statistics describing the cluster might change.

A new data point inside the cluster might be on the perimeter of the circle/sphere/cluster. Or, that data point could be made to be on the perimeter by moving the center and enlarging the radius of the cluster.

The new data point inside the cluster could break the cluster into two clusters both with the same radius. That radius could be smaller than the original cluster. Overlapping clusters are to be avoided. All clusters are supposed to have the same radius. In the n=3, situation, one cluster would contain one data point, and a second cluster would contain two data points.

A new data point outside the current cluster would increase the radius of the cluster or divide into two clusters. Again, both clusters would have the same radius. That radius might be smaller than the original cluster.

With n=3, the center of the new cluster, C3, is located at CP3. CP3 would be on the perimeter of the cluster formerly associated with the first data point, DP1. The purple arrows indicate the overall movement of the centers. The purple numbers indicate the sequence of the arrows/vectors. We measure radius 3 from the perimeter of the third cluster and associate that with CP3, the computed center point of the third cluster, CL3.

Notice that the first cluster no longer exists and was erased, but remains in the illustration in outline form. The data point DP1 of the first cluster and the meta-data associated with that point are still relevant. The second cluster has been superseded as well but was retained in the illustration to show the direction of movement. The second cluster retains its original coloring.

Throughout this sequence of illustrations, I’ve indicated that the definition of distance is left to a metric function in each frame of the sequence. These days, I think of distributions prior to the normal as operating in hyperbolic space; at the normal, the underlying space becomes Euclidean; and beyond the normal, the underlying space becomes spherical. I’m not that deep into clustering yet, but n drives much.

Data points DP1 and DP2 did not move when the cluster moved to include DP3. This does not seem possible unless DP1 and DP2 were not on a diameter of the second cluster. I just don’t have the tools to verify this one way or another.

The distance between the original cluster and the second was large. The distance is much smaller between the second and third clusters.

This is the process, in general, that is used to cluster those large datasets and their snapshot view. Real clustering is very iterative and calculation intensive. Try to do your analysis with data that is normal. Test for normalcy.

When I got to the fourth data point, our single cluster got divided into two clusters. I ran of time revising that figure to present the next clusters in another frame of our annimation. I’ll revise the post at a later date.

More to the point an animated view is a part of achieving transparency in machine learning. I wouldn’t have enjoyed trying to see the effects of throwing one more assertion into Prolog and trying to figure out what it concluded after that.

Enjoy.