Assuming Normality

January 17, 2020

Another aspect of the diagram that the last post was written about, is the difference between reality and the assumption of normality. Frequentists avoid reality by assuming normality when they normalize the data in their dataset before conducting their analysis. They hide the dynamics of the data collection as well. The dynamics of the geometry of the space is tied to the dynamics dynamics of the data collection.

Bayesians assume normality as well.

Statistical inference requires normality, so it is assumed. Most people don’t know how to do statistical inference with other distributions.

Don’t assume normality. Don’t assume data. Don’t assume the rate and result of your data collection.


What that statistical snapshot won't tell you

January 11, 2020

Watching a normal distribution achieve normality from a sample size of one is informative, but we jump over that using more data and worse assuming normality. Slowing down and looking at less data will tell you where the short tail and where the log tail are for a given dimension. The same is true of every subset you take from that normal.

The following graphic was taken from a paper by Peters, O. The ergodicity problem in economics. Nat. Phys.15, 1216–1221 (2019) doi:10.1038/s41567-019-0732-0. The graphic shows us how the skewed distribution achieves normality. The footprint of short tail of the skewed distribution does not exceed the footprint of the normal. Investments made in the short tail persist while investments in the long tail vanish. More data points just reveal the error. Or, put another way, growth reveals the error.

I added the tail labels to the diagram. Upon close inspection the normal is separated from slightly from the skewed normal. The skewed normal remains inside the normal.

The averages of the skewed normal from left to right are the mode, the median, and the mean. The median is anchored at the midpoint between the mde and the mean. The median runs from that midpoint on the x-axis to the top of the mode. The steeper the mode is the closer the skewed distribution is to achieving the normal.

At a given n data points, a distribution achieves normality, any subset of that distribution at that n number of data points will be skewed. The definition of the subset implies the definition of a new aggregate dimension. That in turn implies a new nomial, aka a new peak.

In the next figure, I drew the bases of the distributions. Skewness implies an ellipse. Once normality is achieved, the base is a circle. Every distribution implies a core. Skewness implies tails and shoulders. The gray vertical line is my estimate of where the tail and shoulder transition. The pink circle divides the core and the tails. I labeled this as the shoulder, but lacking the data that is the best I could do. The red area is where the normal is outside the ellipse. Those areas are tails that emerged as the distribution approached normality. Investment there will not be lost as the sample size continues to increase.

The core depends on the variance, so it can get larger or smaller. When it gets larger, Investments in that area of the tail should be reexamined. The core can be considered a “don’t care.”


Ito Processes

October 30, 2019

Markov processes make decisions based on the situation in the present. They make decisions based only on the current situation. They do not remember how they got where they are. They have no memory.

Ito processes make decisions based on the current situation and a finite past. How much past is up to you. They remember a fixed amount of how they got where they are. Memory here is a parameter.

Markov processes are Ito processes that have memory = 0.

A link on Twitter took me to an article on Quanta Magazine’s website, Smarter Parts Make Collective Systems Too Stubborn. The researchers were trying to find out how much memory is too much in the Ito process sense of memory. In a more human sense, memory would be cognitive load. Once upon a time, the rule was the 9±2 items.

The article used an illustration to summarize the research and their findings. The outer circle is where the processes started. The number circles represent each trial. The numbers in the circles tell us how much memory the associated process had. The goal was denoted by the star in the center of the concentric circles. I annotated the star with a red zero. I also numbered each circle from the center outward. There are six concentric circles.

Then, I turned the diagram into a table. Then, I annotated the table in terms of the shape of the distribution. The distribution exhibited short and long tails. The distribution is skewed. This implies that the distribution is asymmetric. And, that implies that the space of the distribution is hyperbolic. The grey and red boxes indicate the results of each trial. The star, or zero was approached, but never reached. The red box is a median, not a mean. The short tail has a mode associated with it. And, the mean is on the long tail side of the distribution.

The table tells us that the ideal cognitive load is 5 elements. The table demonstrates a saddle point at 5 elements. Performance decays beyond 5 elements. This is typical of what happens when optimal values, the value of the game, are exceeded.

The results of the reported research are contrary to earlier findings known as the wisdom of crowds. The article sees values exceeding 7 elements as being too uncorrelated. We typically maintain our batch sizes smaller, so the process maintains its correlations.

The diagram and the table show that the memory parameter was varied from 1 to 13 elements. Machine learning does hill climbing. It would have discovered peak performance was achieved when the memory parameter was set to 5 elements.

In some other reading, a formerly Boolean classification was parameterized and widened greatly. As an innovation, parameterizing a formerly unparameterized phenomena would be a continuous innovation until new theory was needed. There can be multiple parameterizations with each parameterization having its own logic and its own ontology. Each such parameterization is a standalone construct. They can be discontinuous innovations.

The concentric circles are indicative of set of Fourier transforms. Each circle can be thought of as a waveguide filtering out frequencies that would not fit inside that circle.

The technology adoption lifecycle phase is a set of Fourier transforms. Each phase, limits the cognitive load of the organization in that phase. Some work done in a specific phase is specific to that phase. Some business processes work in a phase. Others do not.

For prospects, customers, and clients, task sublimation is the primary organizer of phases. Different populations, different markets require different task sublimations. Different populations will exhibit different cognitive loads. Don’t think easy is what is needed. Remember that the star, the goal in the diagram was never reached. It was approached.

Parameterize the problem. Watch as each parameter approaches, converges, and then diverges. The goal might be beyond the value of the game. Watch performance improve and later degrade.


Innovation Notes from Forbes

September 23, 2019

From, September 19th, 2019,”Everyone Wants To Innovate, But No One Wants To Change” gives grim notice on the state of innovation in the orthodoxy. Citing an Accenture study, the article claims that “… innovation spending has declined by 27% over the past five years.” It goes on to narrow its focus to high-growth companies that are change-oriented, outcome-led and disruption-minded.  So what changed? Not much. Led by outcome? Is there any other way these days? And, worse, the ever bogus, disruption minded. Sorry, but I don’t give a damn about disruption.

None of those things have anything to do with innovation. Well, disruption, in the Christensen sense denotes the end of the technology adoption lifecycle and the death of the category involved. Foster disruptions happen in the early phases of the TALC and Christensen disruptions happen in the late phases. Foster’s was not competitive. Yes, it was bad when it happens to an industry, but it didn’t happen you yours. Hell, you just birthed your category and you have fifty years or more to go until the Christensen disruption will happen to your company, unless you don’t play that game. I’ll be bowling instead. I will keep my bowling ally full. And, I will get that company acquired long before that Christensen disruption happens. We will birth many discontinuous technological innovation before then.

High-growth companies? Really. Their addressable market is fixed, so what does high growth really mean? It has always been that the initial sale generated more revenues than upgrade/subscription sales. Growth means the speed at which you are making those initial sales, which means that high growth means another day closer to the complete consumption of that addressable market. It’s not sustainable. It’s a buzzword. The CEO doesn’t care. They will be gone when the buzzword doesn’t pan out.

Change? No. The TALC isn’t about change. It’s a process. Different parts of the company handle different phases. There are interfaces. There are preparatory work. There is the handoff. The staff moves or stays put depending on their role. The processes in the phase evolve, but don’t exit the phase. The prospects might be in the next phase, but we sell them when we get there. Things that look like change don’t change. But, both processes and populations are phase specific. They don’t change. They wait.

Organizational structure helps to isolate the phase specific changes. In a given phase specific division, the processes are specific. They don’t change within that organization. They change across organizations. The prospects don’t change within that organization either. Again, they change across the organizations. They change when it is time to move to the next phase. The change happens across the interface.

“Outcome led” is one of those buzzwords of the month. Different phases have different outcomes.

The first phase, the technical enthusiast phase demands a scientist-driven or licensed engineer-driven innovation that is research based. It isn’t stuck in a given industry beyond our ability to implement it. The outcome is a new discontinuous innovation.

One of the primary reasons for the low payoff of the innovations looked at in that Accenture study was that the innovation was continuous, which is done for cash, not economic wealth, and involves no scientists or engineers. It is inexpensive. It pivots. There is no commitment to anything. It is more of whatever the company already does. It has to change, because the people that got the company where it is today are new. The people in the earlier phases of the TALC have been laid off. They and their process knowledge are gone. They learned, then they forgot. They don’t have repeatable phase specific process. They lost those when they lost those phase-specific people.

Another reason is that they are copying their competitors. That copying is innovative when the act of managing gets confused with innovation. And, their VCs would not have invested in anything that needed market creation. Market creation is the mess that tiny populations and the understating mathematics related to that population taught these VCs not to go there. The rule for these VCs is the market must exist. It’s where they get the data that they use for their financial projections. It’s what they know.

The second phase is the bowling ally where we find six early adopter, one at a time in six different and unrelated vertical, that has the early adopter’s own value proposition and client product visualization. That takes years, and its pay as you/they go. Notice we do not chose the value proposition or product visualization. We chose the technology, the carrier. We enter that vertical working for the early adopter. We don’t know anything about the vertical beyond seats and dollars, and the early adopter’s company’s position in the industrial classification tree. That position has to be in the middle of the tree. Generalized functionality is up the tree. Specialized functionality is down the tree. So seats and dollars, and position is all we need to know.

Then, we send in the ethnographers to capture the cognitive model of the vertical specific to the company’s position in the tree. This an application specific to the functional units and business units of the early adopter’s company within the vertical. It is not an IT proposition. It is not carrier that we are selling. We are using the carrier to implement the carried content. This is not about the enterprise architecture yet. That is transitional work at the end of the phase. There are no competitors. We control the underlying technology. It is ours and ours alone. It does not become the early adopter’s.

The outcome here is the early adopter’s successful achievement of their value proposition. The early adopter must succeed in their own terms. They must have a inductive success story. We have to sell to the rest of the vertical. And, the early adopter is known as a risk taker that takes more risk than the rest of the vertical. Much work remains before we get to the next outcome dominance of the application in the early adopter’s vertical.

Then, comes the work to convert the vertical application into a horizontal application. This can happen after success in the vertical. We will be in that first vertical for more than a decade, so the enterprise architecture gets built out over that decade. This is the third outcome of the early adopter phase.

And, there is a fourth outcome, once all six verticals have achieved their third outcome, a successful transition to the tornado. The tornado is the gate to the IT horizontal. Getting ready for the tornado involves rewriting six different vertical applications into a common application carrier and template-based specifics to deliver the carried content of the existing six vertical applications. We must aggregate the six vertical populations into a single population in the IT horizontal.

Then, we are off to the Tornado/IT horizontal. The carried content people stay behind and move with the early adopter’s companies. The cognitive models continue to change, so our people push those changes into the IT horizontal application as mass customizations. The IT horizontal has its own cognitive model from its own IT population.

As we have spare capacity in the bowling alley, we seek our next technology and start the whole process again starting with the first phase. We are always innovating discontinuously in an ongoing manner. Nobody gets laid off. We need those people again shortly, so we are not laying them off. We are retaining the learning that we embedded in our staff and our processes.

The idea that we must innovate at scale is an entirely different way of looking at the word, and an entirely different way of innovating, the way of the continuous innovation. I’ll stick with discontinuous innovation, the return is much higher.

This separation concept was Christensen’s best idea. But, the business orthodoxy did not accept this idea. The organization that Christensen built to sell that idea did not succeed. The people from that organization are still out there making their own names in the innovation press. The pivoted. Even Christensen pivoted. It’s like serial entrepreneurs. They failed. Worse, they delivered us into the hands of the orthodoxy as it refuses to innovate discontinuously, and insists on continuous innovation that barely keeps their category alive for the next four quarters.

In the meantime, the BLS tells us that 48% of Americans have left the workforce. This due to globalism and management’s refusal to create new careers and new value chains. This due to the refusal of neo-VCs to do the real work of the VCs before them that invested to change the world, rather an investing in the quick, riskless propositions found in a spherical geometry that understates its safety, riskless safety in which small returns are the best that can be had.

Those people that left the workforce are a force that will lead us to the next world war. They need prosperity. We’ve already seen despair. Innovation matters. That 27% return should be telling us loudly that we are not changing in the world. The innovation press screams at us with quotes from Web 1.0ers that did do discontinuous innovation, rather than what we do these days. We’ve used arguments from those days to get states to subsidize “innovators” whose best claim will be pivoting, exits, serial entrepreneurship, but not one more job outside of tech and beyond the VCs quick turns to rapid exits, beyond any real prosperity for anyone beyond the VCs and founders.

We can regain prosperity by slowing down, following the process, and reading the journals to discover the road not taken, the road of discontinuous innovation.


August 28, 2019

Twitter brought it up again, n-dimensional packing, with a link to An Adventure in the Nth Dimension in American Scientist. An earlier article in Quanta Magazine, Sphere Packing Solved in Higher Dimensions, kept the problem in mind.

So why would a product manager care? Do we mind our dimensions? Do we know our parameters? Do we notice that the core of our normal distribution is empty? Are our clusters spherical or elliptical? Do we wait around for big data to show up while driving our developers to ship?

I replied to a comment in the first article. The article never touched on the fact that pi is not a constant. The pi is a constant assertion that lives in L2. L2 is a particular Lp space where p=2. L2 is our familiar Euclidean space. When we assert a random variable we are in L0. Our first dimension puts us in L1; our second, L2; our third, L3; and so forth.

I’ve been drawing circles as the footprint of my normal distributions. Unless I specifically meant a two dimensional normal, they should have been squircles. A circle is a squared-off circle or a square with circular corners.

The red object is a squircle in L4. That is the fourth dimension. The n here refers to the dimension.

The blue object is a circle in L2. We could also consider it to be a squircle in L2.

If they are both footprints of normal distributions, then the blue distribution would be a subset of the red distribution. Both have enough data points separately to have achieved normality. Otherwise, they would be skewed and elliptical.

The L2 squircle might be centered elsewhere and it might be an independent subset of the superset L4. That would require independent markers that I discussed in the last post. Independence implies an absence of correlation. There is no reason to assume that the footprints of independent subsets share the same base plane.

The reason I added a circle to the diagram of the L4 squircle was to demonstrate that the circumference of the L4 squircle is larger than that of the L2 squircle, aka the circle. That, given that π is defined at the ratio of the circumference to the diameter, π = C/d = C/2d, and that implies that every Lp space has a unique value for π. This was not discussed in the article that led to this blog post. It turns out that dimension n parameterizes the shape of the footprint of the normal distribution.

The dimension n would differ from supersets and subsets. Each dimension achieves normality on its own. Don’t assume normality. Know which tail is which if the dimension is not yet normal. Every dimension has two tails until normality is achieved. This implies that the aggregate normal that has not achieved normality in every dimension is not symmetric.

Lp spaces are weird. When the dimension is not an integer, that space is fractal.

The normal distribution has a core, a shoulder, and a tail. Kurtosis is about shoulders and tails. This is a relatively new view of the purpose of kurtosis. More importantly, the core is empty. The mean might be a real number when the data is integers. The mean is a statistic, not necessarily data.

When we talk about spheres, the high-dimensional sphere is empty. As the dimension increases, the probability mass migrates to the corners, which become spikes in the high-dimensional sphere. There is some math describing that migration. The spikes are like absolute values in that they are not continuous. There is no smooth surface covering the sphere. It’s one point to another, one tip of the spike to the next. You have to jump/leap from one to the next. Do we see this with real customers? Or, real requirements.

Sphere packing with spikey spheres means that we can compress the space since the spikes interleave. In our jumping from one spike to the next and from one sphere to another, how will that make sense to a user?

This graph from the American Scientist article is the historical flight envelope of sphere packing. Apparently, nobody had gone beyond 20 integer dimensions. The spheres look smooth as well.

I took statistics decades ago. Statistics was a hot topic back then. Much work was being done then. I’m surprised by the parameterizations that happened since then. Lp space is indexed by n, the number of dimensions, a parameter. Things that we think of as constants have become parameters.

Parameters are axes, aka dimensions. Instead of waiting until your data pushes your distribution hits a particular parameter value, you can set the parameter, generate the distribution and explore your inferential environment under that parametric circumstance. The architect Peter Eisenman used generative design. He did this by specifying the parameters or rules and observing his CAD system animate a building defined by those parameters and rules. Similarly, you can check your strategies in the same way–long before you have the data or the illuminators that lead to that data.

Much of the phase changes that we call the technology adoption lifecycle involves independent markers or data that never got into our data. It is all too easy to Agile into code for a population that we shouldn’t be serving yet. The mantras about easy fail to see that easy might have us serving the wrong population. The cloud is the easiest. It is for phobics. It is not early mainstreet. Our data won’t tell us. We were not looking for it. This happens given big data or not.

The more we know, the less we knew. We didn’t know π was a parameter.


Independent Markers

August 11, 2019

Well, as usual, Twitter peeps posted something, so I dived in and discovered something I’ve barely had time to dive into. Antonio Gutierrez posted Geometry Problem 1443. Take a look.

It is a problem about the area of the large triangle and the three triangles comprising the larger triangle.

A triangle has an orthocenter and an incenter. A triangle has many centers. The orthocenter is the center of the circle around the large triangle. I’ve labelled that circle as the superset. The incenter is the center of the circle inside the large triangle. That circle is the subset.

It doesn’t look like a statistics problem, but when I saw that symbol for perpendicularity implying that the subset is independent of the superset. It quickly became a statistics problem and a product marketing problem.

The line AB is not a diameter, so the angle ACB is not a right angle . If AB were a diameter, angle ACB would be a right angle. The purple lines run through the orthocenter, the center of the circle representing the superset, which implies that the purple lines are diameters. I drew them because I was thinking about the triangle model where triangles are proofs. And, I checked it against my game-theoretic model of generative games. The line AB is not distant from the middle diameter line. This is enough to say that the two thin red lines might converge at a distant point. As the line is moved further from the diameter, the lines will converge sooner. Generally, constraints will bring about the convergence as the large triangle is a development effort and the point C is the anchor of the generation of the generative space. The generative effort’s solution is the line AB. The generative effort does not move the populations of the subset or superset.

O is the orthocenter of the larger triangle. A line from O to A is the radius of the large circle representing the superset. I is the incenter of the large triangle. A line from I to D is the radius of the small circle representing the independent subset.

Now for a more statistical view.

When I googled independent subsets, most of the answers said no. But I found, a book, New Frontiers in Graph Theory edited by Yagang Zhang that discussed how the subset could be independent. I have not read it fully yet. but the discussion centers around something called markers. The superset is a multidimensional normal. The subset is likewise but the subset contains markers, these being additional dimensions not included in the superset. That adjust a distribution’s x-axis relative to the y-axis, something you’ve seen if you read my later posts on black swans. And, this x-axis vertical shift or movement of the distribution’s base is also what happens with Christensen disruptions, aka down market moves. In both black swans and Christensen disruptions, the distribution’s convergences with the x-axis move inward or outward.

In the above figure, we have projected from the view from above to a view from the side. The red distribution (with the gray base), the distribution of the subset, is the one that includes the markers. The markers are below the base of the superset. The markers are how the subset obtains its independence. The dimensions of the marker are not included in the superset’s multinomial distribution. The dimension axes for the markers are not on the same plane as those of superset.

Now, keep in mind that I did not yet get to read the book on these markers and independent subsets. But, this is my solution. I see dimensions as axes related by an ontological tree. Those markers would be ontons in that tree. Once realized, ontons become taxons in another tree, a taxonomic tree.

Survey’s live long lives. We add questions. Each question could be addressing a new taxon, a new branch in the tree that is the survey. We delete questions. Data enters and leaves the distribution, or in the case of markers disappear below the plane of the distribution.

Problems of discriminatory biases embedded in machine learning models can be addressed by markers. Generative adversarial networks are machine learning models that use additional data to grade the original machine learning model. We can call those data markers.

I am troubled by perpendicularity implying independence. The x-axis and the y-axis are perpendicular until you work with bases in linear algebra. But, the symbol for perpendicularity did not lead me down a rabbit hole.


Proteins in Evolution for PMs

July 13, 2019

Twitter linked me to Emanuel Derman‘s review article on Black-Scholes, Trading Volatility in the Inference Review journal. From there I looked around the journal finding The New View of Proteins article. A quote from the New View article leads back to a need for changes in the organizational structure of companies that intend to innovate discontinuously in an ongoing manner:

“… Yet one organism may be five mutations away from a new useful feature, and the other organism, after collecting neutral changes, only one. In adaptive hardship, it will be in a much better position to evolve. Neutral evolution and Darwinian evolution, instead of being exclusive, can operate in symbiosis. …”

Christensen called for separation in his Inventor’s Dilemma, his best idea, which didn’t get traction. I read that book when it had first hit the bookstores. It didn’t get traction because the business orthodoxy of the typical business wouldn’t allow it. Late mainstreet is typical of that business orthodoxy which fosters the notions of innovating at scale and strategic alignment, both of those being contrary to the technology adoption lifecycle. Christensen didn’t go into what separation was in any implementable detail.

Well, proteins innovate. And, proteins don’t care about the business orthodoxy. Proteins have to innovate on demand, and do so without scale. Proteins are ready. The article explains how. The short answer is that proteins do more than one thing. It does its main thing, and does other irrelevant things, those irrelevant things being neutral changes that have a symbiotic, not free relationship, with the main process of the protein. When fitness changes those relationships can change in ways that provide a rapid response.

The technology adoption lifecycle takes a discontinuous innovation from birth to death, which redefines fitness in every phase. The continuous innovations we see today skip the early phases demanding instead a near instantaneous innovation at scale in the late mainstreet phase. Well, proteins do not do that. They birth biological processes ahead of the need just because the generative space, a niche, is there to be entered. That is a speciation process, a birthing of a category in isolation long before it’s explosion into the wider world, a birthing before the chasm long, again, LONG, before it is an innovation “at scale” in the late mainstreet phase.

WARNING: When I talk about chasms, I’m talking about Moore’s chasm in the first edition of his Crossing the Chasm book. There have been a second edition, and a third. They are not the same books. The third edition is why so many people think they are crossing a chasm when they are not. The innovation must be discontinuous before you have a chasm to cross.

Almost no one innovates discontinuously these days. Nicolas Negroponte complained about that in one of his addresses where he mentions the Media Lab. Nobody there these days are trying to change the world. They just want to make some money.

After the Web 1.0 bust, Moore moved over to the business orthodoxy to survive. His books always followed the development of his methodology. He changed the message to sell more books. What was lost was the process underlying discontinuous innovation. He told us how. We got distracted.

But, proteins are making the case again. Fitness changes. Fitness can undergo discontinuous change. Evolution forces proteins to follow suit.

It doesn’t force product strategist to follow suit. It doesn’t force product managers to follow suit. Nor, does it make the underlying technologies and their user facing products follow suit.

It’s not tough. It’s not efficient. Having a capability, a process, a staff trained to use that capability is necessary. The neutral simbiosis will save you from the long transition Apple has undertaken to get to its next thing. Microsoft stumbled for a while as well. The lack of neutral symbiosis is part of the incumbent’s problem.

When I tweet my Strategy as Tires tweets: the speaker is a CEO in a company doing discontinuous innovation in an ongoing manner. He keeps the bowling ally full, the functions in their phases, and the categories move on before they die. And, yes, those neutral symbiants are kept lying in wait for their moment to pounce. Take it that you incessant change. They are ready.

Mostly, we are stuck in the past, while we quote the movers, those that were ready when fitness changed.


Pythagorean Theorem for PMs

June 21, 2019

What? Well, my math review forces me to go read about things I know. And, things I didn’t know, or things I never bothered to connect before.

In statistics, or in all math, independent variables are orthogonal. And, in equations one side of the equal sign is a collection of independent variables are independent, and the variables on the other side of the equation sign are dependent variables. Independent and dependent variables have relationships.

Now, change subjects for a moment. In MS project or in all projects, you have independent tasks and dependent tasks. And, these independent and dependent tasks have relationships.

Statistics was built on simple math. Simple math like the Pythagorean Theorem. You can argue about what is simple, but the Pythagorean Theorem is math BC, aka before calculus.

Distance is one of those simple ideas that gets messy fast, particularly when you collect data and you have many dimensions. The usual approach is to add another dimension to the Pythagorean Theorem. That’s what I was expecting when I read an email sent me out to the Better Explained blog, The author of this blog always has another take. I read this month’s post on another subject and went to look for what else I could find. I found a post, “How to Measure Any Distance with the Pythagorean Theorem,” Read it. Yes, the whole thing. There is more relevant content than I’m going to talk about. The author of this post assumes a Euclidean geometry, which around me means my data has achieved normality.

He build up slowly. We’ll just dive in the deep end of the cold pool. You know this stuff like me, or like me, assume you know this stuff.

The Multidimensional Pythagorean Theorem.

In this figure, I labeled the independent and dependent variables. This labeling assumed finding z was the goal. If we were trying to find b, then b would be dependent so the labels would be different.

In the software as media model, a would be the carrier code, and b would be carried content. Which implies a b is the unknown situation. The developer doesn’t know that stuff yet. And, without an ethnographer might never know that stuff. Steve Jobs knew typography, the developers of desktop publishing software 1.0 didn’t. But, don’t worry, the developers won the long war with MS Word for Windows, which didn’t permit graphic designers to specify a grid, which could be done in MS Word for DOS. Oh, well.

Those triangles would be handoffs, which is one of those dreaded concepts in Agile. The red triangle would be your technical writer; orange, your training people or marketing. However, you do it, or they do it.

Independent and dependent variables in a multidimensional application of the Pythagorean Theory

There are more dependent variables in the equation from the underlying source diagram so I drew another diagram to expose those.

The independent variables are shown on a yellow background. The dependent variables are shown on a white background. Notice that the dependent variables are hypotenuses.

In an example of linear regression that I worked through to the bitter end, new independent variables kept being added. And, the correlations kept being reordered. This was similar to the order of factors in a factor analysis which runs from steeper and longer working to flatter and shorter. There was always another factor because the budget would run out before the equation converged with the x-axis.

This particular view of the Pythagorean Theorem gives us a very general tool that has its place throughout product management and project management. Play with it. Enjoy.

Box Plots Again

June 3, 2019

I went through my email this morning and came across an email from Medium Daily Digest. I don’t link to to them often, but The 5 Basic Statistics Concepts Data Scientists Need to Know looked like it might be a good read. Big data diverges from statistics. The underlying assumptions are not the same.

So my read began. The first thing that struck me was a diagram of a box plot. It needed some interpretation. The underlying distribution is

skewed. If the distribution was normal, the median would be in the middle of the rectangle. The median would be slightly closer to 1.0. You can find this by drawing diagonals across the rectangle. They would intersect at the mean. In a normal that has achieved normality, the mean, the median, and the normal converge. You will see this in later diagrams. The box plot is shown here in standard form.

Each quartile contains 25 percent of the dataset.

Skewed distributions should not be prevalent in big data. So we are talking small data, but how can that be given it is typically used in daily stock price reporting. We’ll get to that later.

In big data, normality is usually assumed, so although I got on this “is it normal” kick when I read a big data book telling me not to assume normality. As a do since then, I call it out. As I’m going to do in this post. Normality takes at least 2048 data points in a single dimension. So five dimensions requires 5×2048, or 10249 data points. When we focus on subsets, we might have less than 2048 data points, so that gives us a skewed normal. In n dimensional normals, the constituent normals that we are assuming are normal are not, in fact, normal yet. They are still skewed.

We mostly ignore this at our peril. When we make statistical inferences, we are assuming normality because the inference process requires it. Yes, experts can make inferences with other distributions, and no distribution at all, but we can’t.

I’ve read some paper on estimating distribution parameters where the suggested practice is to compute the parameters using a formula giving you the “standardized” mean and standard deviation.

I revised the above figure to show some of the things you can figure out given a box plot. I added the mean and mode. The mode is always on the short tail side of the distribution. The mean is always on the long tail side of the distribution. If the distribution had achieved normality, the median would be in the middle of the box. As it is, the median is below the center of the rectangle so it will take more data points before the distribution achieves normality. In a skewed normal, the mean and mode diverge symmetrically from the median. Once normality is achieved, the mode, mean, and median would converge to the same point. There would be a kurtosis of 3, which indicates that the tails are symmetrical. That implies that the curvature of the tails are the same as well.

That curvature would also define a torus would sitting on top of the tails. When the distribution is not yet normal, or is skewed, that torus would be a cyclide. A torus has a constant radius while a cyclide is a tube that starts with a small radius, which increases as it is swept around the normal from the short tail to the long tail. The long tail is where the tube has the largest radius. Neither of these are shown in this diagram. That cyclide is important over the life of the distribution, because it orients the normal. Once the distribution achieves normality, that orientation is lost due to symmetry, or not. That challenges some simplifying assumptions I will not address today, as in further research is required. But, accepting the orthodox, symmetry makes that orientation disappear.

A skewed normal as it appears in a box plot.

I showed, in black, where the core of the normal would be. I also indicted where the shoulder of the distribution would be. Kurtosis and tails start at the shoulder. The core is not informative. I used a thick red arrow pointing up to show how the mode median and mean would converge or merge. In a skewed distribution, the median is leaning over. As the distribution becomes more normal, it stands up straighter. Once normality is achieved, the median is perpendicular to the base of the distribution. Notice that the short tail does not move. I also show using a thick red arrow pointing down showing how the long tail will contract as the distribution becomes normal.

Invest on the stable side of the distribution, or infer on the stable side. Those decisions will last long after normality is achieved.

The next figure shows how to illustrate the curvature of the tails given just the box plot and some assumptions of our own.

Tails, curvatures, and the cyclide.

We begin here on the axis of the analyzed dimension, shown in orange. I’ve extended this horizontal axis beyond the box plot, shown in red.

The distance from the mean to the end of the maximum value in the box chart, the point at the top of the diagram marked with a “^” symbol rotated ninety degrees. This is also labeled, in blue, as a point of convergence. That distance is one half the length of the associated square, shown in red. The circle inside that box represents the diameter of the cyclide tube at the long tail.

The distance from the mode to the end f the minimum value in the box chart, the point at the bottom of the diagram marked with a “^” symbol and labeled as a point of convergence. Again, that distance is one half the length of the associated square that contains a circle representing the diameter of the cyclide tube at the short tail.

On both of the circles, the blue portions represent the curvatures of their respective tails. Here is where some assumptions kick in as well as the limitations of my tools. There are diagonals drawn from the mean and the mode to the origins of the respective curvature circles. Each has an angle associated with them. The blue curvature lines are not data driven. The curves should probably be lower. If we could rotate those red boxes in the direction of the black circular arrow while leaving the circles anchored at their convergence points, and clip the blue lines at the green clipping lines, we’d have better curvatures for the tails.

A tube would be swept around from the small circle to the large circle and continuing around to the small circle.

Here the light blue lines help us imagine the curvature circles being swept around the core of the distribution. This sweep generates the cyclide. This figure also shows the distribution as being skewed. The median eventually stands up perpendicular to the base place. The purple line equates this standing up of the median as the moment when the distribution has enough data points to no longer be skewed. The distribution would finally be normal. They cyclide would then be a torus. The short tail radius would have to grow, and the long tail radius would have to shrink.

So how does a multidimensional normal end up with a two dimensional distribution and a one dimensional box chart? The box chart shows the aggregation of a lot of information that gets summarized into a single number, the price of the share. Notice that frequency information is encoded in the box chart quartiles, but that is not apparent.

Notice that outliers might extend the range of the dimension. They are not shown. The box chart reflects the market’s knowledge as of the time of purchase. Tomorrow’s information is still unknown. The range of the next day’s information is unknown as well. The number of data points will increase so the distribution could well become normal. But, the increase in the number of data points tomorrow is unknown.

Had we build product and income streams into the long tail, we would be out of luck.



May 13, 2019

When statistics was invented, it was based on some simple math, the mathematics of distance. The world was Euclidean. Truth was binary. Inference was based on normal distributions, areas under the curves, and distances. Those normals were symmetric. There were no long tails, and no short tails. Pi was a constant.

Now, we have Poisson distributions, small data, and big data. We have hyperbolic spaces, Euclidean spaces, and spherical spaces among many spaces. We have linear spaces and non-linear spaces. We have continuous spaces and finite spaces. Truth is no longer binary. Inference is still based on normal distributions. Those normals become symmetric. Skewness and kurtosis give us long tails and short tails. Pi is variable. And, the total probability mass is tied to pi, so it is also variable running from less than one to more than one.

The number of data points, n, drive our distributions differentially. “We are departing our estimated normal. But, we will be travelling through a skewed normal for a while.” You have to wonder if that probability mass is a gas or a solid. Is the probability mass laid out in layers as the modes and their associated tails move?

It’s too much, but the snapshot view of statistics lets us ignore much, and assume much.

This figure started out as a figure showing what a normal distribution in the Lp geometry looked like when p = 0.5. This is shown in blue. This is a normal in hyperbolic space. The usual normal that we are familiar with happens in L2 space or Lp space where p =2. This is the gray circle that touches the box that surrounds the distribution. That circle is a unit circle of radius 1.

The aqua blue line in the first figure shows the curve of say p=0.1. The figure immediately above shows what happens as p increases, the line approaches and exceeds p=2. At p=1, the line would be straight, and we would have a taxicab geometry. The value of p can exceed p=2. When it does so, the distribution has entered spherical space. The total probability density equals 1 at p=2. It is more than 1 when p<2. It is less than 1 when p>2.

The process starts with that Dirac function where the line goes to infinity. Then, the probability mass flows down into an eventual normal. That eventual normal travers across the Lp geometries. The geometry is hyperbolic until the Lp geometry reaches L2, where p=2. The total probability mass is more than one. The L2 geometry is the standard normal. In the L2 geometry, the total probability mass is one. Then the Lp geometry exceeds p=2. This is the spherical geometry where the probability mass migrates to the shell of the sphere leaving the core empty. At this point the total probability mass is less than one.

Notice that the early adopter phase of the technology adoption lifecycle happens when the total probability mass is more than one. And, the late mainstreet and later phases happen when the total probability is less than one. These changes in geometry mess with our understanding of our financial projections. That early adopter phase is relative to discontinuous innovations, not continuous innovations as the latter happen in the mainstreet or later phases. That early adopter phase is commonly characterized as being risky, but this happens because hyperbolic spaces suppress projections of future dollars, and the problems of investing in skewed distribution where the long tails contract while the short tails remain anchored. The probability mass being more than one with us assuming it is one has us understating the probabilities of success. Our assumptions have us walking away from nice upsides.

All these changes happen as the number of data points, n, increases.

The distribution started when we asserted the existence of a stochastic variable. This action puts a Dirac function on the red plus sign that sits at the initial origin, (0,0) of the unit circle of the distribution. This value for the origin at this n=0, should appear in black, which is used here to encode the observable values of the parameter.

Watch the following animation. It shows how the footprint of the distribution changes as n increases. The distribution comes into existence and then traverses the geometries from the origin to the distant shell of the eventual sphere. This animation shows how the normal achieves the unit circle once it begins life from the origin, and traverses from hyperbolic space to Euclidean space.

In the very first figure, the light gray text lists our assumptions. The darker gray text is observations from the figure. The origin and the radius are such observables. The red text are implied values. We are assume a normal, so the mean and the standard deviation are implied from that. The black text are knowns given that the distribution is in hyperbolic space.

The color codes are a mess. It really comes down to assertions cascading into other assertions.

The thick red circle shows us where the sample of the means happens as n increases. We have a theoretical mean for the location of the origin that needs to be replaced by an actual mean. Likewise, we have a a theoretical standard deviation. That standard deviation controls the sized of the distribution, which will move until normality is achieved in each of the two underlying dimensions. Notice that we have not specified the dimensions. And, those dimensions are shown here as no having skew. We assumed the normal has already achieved normality.

OK. So what?

We here about p-hunting and the lack of the statistical significance parameters actually representing anything about the inferences being made these days. But, hyperbolic spaces are different in terms of inference. The inference parameter of α and β are not sufficient in hyperbolic space as illustrated in the following figures.

Overlapping based on the Assumed Normals
Overlapping of the Hyperbolic Tails

In the figures, I did not specify the α and β values. The red areas would be those specified by the α and β values so they would be smaller than the areas shown. I’ll assume that the appropriate value were used. But in the first diagram, there would be statistical significance where there is no data at all. In the second diagram, the statistical significance would again be based on the asserted normal, but results would still include some data from the hyperbolic tails but not much.

The orientation of the tails would matter in these inferences. That requires more than a snapshot view. The short tails of a given dimension orients the distribution before normality is achieved. Given the dependence of this orientation on the mode and given that a normal distribution has many modes over its life, orientation is a hard problem. Yes, asserting normality eliminates many difficulties, but it hids much as well.

As product managers, we assume much. Taking a differential view will help us make valid inferences. And, betting on the short tails, not the long tails will save us time and effort. We do most of our work these days in the late mainstreet or later phases. Statistics is actually on our side because the probabilities are higher than we know, and multiple pathways or geodesics that we can follow.