“… Yet one organism may be five mutations away from a new useful feature, and the other organism, after collecting neutral changes, only one. In adaptive hardship, it will be in a much better position to evolve. Neutral evolution and Darwinian evolution, instead of being exclusive, can operate in symbiosis. …”

Christensen called for separation in his *Inventor’s Dilemma*, his best idea, which didn’t get traction. I read that book when it had first hit the bookstores. It didn’t get traction because the business orthodoxy of the typical business wouldn’t allow it. Late mainstreet is typical of that business orthodoxy which fosters the notions of innovating at scale and strategic alignment, both of those being contrary to the technology adoption lifecycle. Christensen didn’t go into what separation was in any implementable detail.

Well, proteins innovate. And, proteins don’t care about the business orthodoxy. Proteins have to innovate on demand, and do so without scale. Proteins are ready. The article explains how. The short answer is that proteins do more than one thing. It does its main thing, and does other irrelevant things, those irrelevant things being neutral changes that have a symbiotic, not free relationship, with the main process of the protein. When fitness changes those relationships can change in ways that provide a rapid response.

The technology adoption lifecycle takes a discontinuous innovation from birth to death, which redefines fitness in every phase. The continuous innovations we see today skip the early phases demanding instead a near instantaneous innovation at scale in the late mainstreet phase. Well, proteins do not do that. They birth biological processes ahead of the need just because the generative space, a niche, is there to be entered. That is a speciation process, a birthing of a category in isolation long before it’s explosion into the wider world, a birthing before the chasm long, again, LONG, before it is an innovation “at scale” in the late mainstreet phase.

WARNING: When I talk about chasms, I’m talking about Moore’s chasm in the first edition of his Crossing the Chasm book. There have been a second edition, and a third. They are not the same books. The third edition is why so many people think they are crossing a chasm when they are not. The innovation must be discontinuous before you have a chasm to cross.

Almost no one innovates discontinuously these days. Nicolas Negroponte complained about that in one of his addresses where he mentions the Media Lab. Nobody there these days are trying to change the world. They just want to make some money.

After the Web 1.0 bust, Moore moved over to the business orthodoxy to survive. His books always followed the development of his methodology. He changed the message to sell more books. What was lost was the process underlying discontinuous innovation. He told us how. We go distracted.

But, proteins are making the case again. Fitness changes. Fitness can undergo discontinuous change. Evolution forces proteins to follow suit.

It doesn’t force product strategist to follow suit. It doesn’t force product managers to follow suit. Nor, does it make the underlying technologies and their user facing products follow suit.

It’s not tough. It’s not efficient. Having a capability, a process, a staff trained to use that capability is necessary. The neutral simbiosis will save you from the long transition Apple has undertaken to get to its next thing. Microsoft stumbled for a while as well. The lack of neutral symbiosis is part of the incumbent’s problem.

When I tweet my Strategy as Tires tweets: the speaker is a CEO in a company doing discontinuous innovation in an ongoing manner. He keeps the bowling ally full, the functions in their phases, and the categories move on before they die. And, yes, those neutral symbiants are kept lying in wait for their moment to pounce. Take that you incessant changes. They are ready.

Mostly, we are stuck in the past, while we quote the movers, those that were ready when fitness changed.

Enjoy!

]]>In statistics, or in all math, independent variables are orthogonal. And, in equations one side of the equal sign is a collection of independent variables are independent, and the variables on the other side of the equation sign are dependent variables. Independent and dependent variables have relationships.

Now, change subjects for a moment. In MS project or in all projects, you have independent tasks and dependent tasks. And, these independent and dependent tasks have relationships.

Statistics was built on simple math. Simple math like the Pythagorean Theorem. You can argue about what is simple, but the Pythagorean Theorem is math BC, aka before calculus.

Distance is one of those simple ideas that gets messy fast, particularly when you collect data and you have many dimensions. The usual approach is to add another dimension to the Pythagorean Theorem. That’s what I was expecting when I read an email sent me out to the *Better Explained* blog, The author of this blog always has another take. I read this month’s post on another subject and went to look for what else I could find. I found a post, “How to Measure Any Distance with the Pythagorean Theorem,” Read it. Yes, the whole thing. There is more relevant content than I’m going to talk about. The author of this post assumes a Euclidean geometry, which around me means my data has achieved normality.

He build up slowly. We’ll just dive in the deep end of the cold pool. You know this stuff like me, or like me, assume you know this stuff.

In this figure, I labeled the independent and dependent variables. This labeling assumed finding z was the goal. If we were trying to find b, then b would be dependent so the labels would be different.

In the software as media model, **a** would be the carrier code, and **b** would be carried content. Which implies a **b** is the unknown situation. The developer doesn’t know that stuff yet. And, without an ethnographer might never know that stuff. Steve Jobs knew typography, the developers of desktop publishing software 1.0 didn’t. But, don’t worry, the developers won the long war with MS Word for Windows, which didn’t permit graphic designers to specify a grid, which could be done in MS Word for DOS. Oh, well.

Those triangles would be handoffs, which is one of those dreaded concepts in Agile. The red triangle would be your technical writer; orange, your training people or marketing. However, you do it, or they do it.

There are more dependent variables in the equation from the underlying source diagram so I drew another diagram to expose those.

The independent variables are shown on a yellow background. The dependent variables are shown on a white background. Notice that the dependent variables are hypotenuses.

In an example of linear regression that I worked through to the bitter end, new independent variables kept being added. And, the correlations kept being reordered. This was similar to the order of factors in a factor analysis which runs from steeper and longer working to flatter and shorter. There was always another factor because the budget would run out before the equation converged with the x-axis.

This particular view of the Pythagorean Theorem gives us a very general tool that has its place throughout product management and project management. Play with it. Enjoy.

]]>So my read began. The first thing that struck me was a diagram of a box plot. It needed some interpretation. The underlying distribution is

skewed. If the distribution was normal, the median would be in the middle of the rectangle. The median would be slightly closer to 1.0. You can find this by drawing diagonals across the rectangle. They would intersect at the mean. In a normal that has achieved normality, the mean, the median, and the normal converge. You will see this in later diagrams. The box plot is shown here in standard form.

Each quartile contains 25 percent of the dataset.

Skewed distributions should not be prevalent in big data. So we are talking small data, but how can that be given it is typically used in daily stock price reporting. We’ll get to that later.

In big data, normality is usually assumed, so although I got on this “is it normal” kick when I read a big data book telling me not to assume normality. As a do since then, I call it out. As I’m going to do in this post. Normality takes at least 2048 data points in a single dimension. So five dimensions requires 5×2048, or 10249 data points. When we focus on subsets, we might have less than 2048 data points, so that gives us a skewed normal. In n dimensional normals, the constituent normals that we are assuming are normal are not, in fact, normal yet. They are still skewed.

We mostly ignore this at our peril. When we make statistical inferences, we are assuming normality because the inference process requires it. Yes, experts can make inferences with other distributions, and no distribution at all, but we can’t.

I’ve read some paper on estimating distribution parameters where the suggested practice is to compute the parameters using a formula giving you the “standardized” mean and standard deviation.

I revised the above figure to show some of the things you can figure out given a box plot. I added the mean and mode. The mode is always on the short tail side of the distribution. The mean is always on the long tail side of the distribution. If the distribution had achieved normality, the median would be in the middle of the box. As it is, the median is below the center of the rectangle so it will take more data points before the distribution achieves normality. In a skewed normal, the mean and mode diverge symmetrically from the median. Once normality is achieved, the mode, mean, and median would converge to the same point. There would be a kurtosis of 3, which indicates that the tails are symmetrical. That implies that the curvature of the tails are the same as well.

That curvature would also define a torus would sitting on top of the tails. When the distribution is not yet normal, or is skewed, that torus would be a cyclide. A torus has a constant radius while a cyclide is a tube that starts with a small radius, which increases as it is swept around the normal from the short tail to the long tail. The long tail is where the tube has the largest radius. Neither of these are shown in this diagram. That cyclide is important over the life of the distribution, because it orients the normal. Once the distribution achieves normality, that orientation is lost due to symmetry, or not. That challenges some simplifying assumptions I will not address today, as in further research is required. But, accepting the orthodox, symmetry makes that orientation disappear.

I showed, in black, where the core of the normal would be. I also indicted where the shoulder of the distribution would be. Kurtosis and tails start at the shoulder. The core is not informative. I used a thick red arrow pointing up to show how the mode median and mean would converge or merge. In a skewed distribution, the median is leaning over. As the distribution becomes more normal, it stands up straighter. Once normality is achieved, the median is perpendicular to the base of the distribution. Notice that the short tail does not move. I also show using a thick red arrow pointing down showing how the long tail will contract as the distribution becomes normal.

Invest on the stable side of the distribution, or infer on the stable side. Those decisions will last long after normality is achieved.

The next figure shows how to illustrate the curvature of the tails given just the box plot and some assumptions of our own.

We begin here on the axis of the analyzed dimension, shown in orange. I’ve extended this horizontal axis beyond the box plot, shown in red.

The distance from the mean to the end of the maximum value in the box chart, the point at the top of the diagram marked with a “^” symbol rotated ninety degrees. This is also labeled, in blue, as a point of convergence. That distance is one half the length of the associated square, shown in red. The circle inside that box represents the diameter of the cyclide tube at the long tail.

The distance from the mode to the end f the minimum value in the box chart, the point at the bottom of the diagram marked with a “^” symbol and labeled as a point of convergence. Again, that distance is one half the length of the associated square that contains a circle representing the diameter of the cyclide tube at the short tail.

On both of the circles, the blue portions represent the curvatures of their respective tails. Here is where some assumptions kick in as well as the limitations of my tools. There are diagonals drawn from the mean and the mode to the origins of the respective curvature circles. Each has an angle associated with them. The blue curvature lines are not data driven. The curves should probably be lower. If we could rotate those red boxes in the direction of the black circular arrow while leaving the circles anchored at their convergence points, and clip the blue lines at the green clipping lines, we’d have better curvatures for the tails.

A tube would be swept around from the small circle to the large circle and continuing around to the small circle.

Here the light blue lines help us imagine the curvature circles being swept around the core of the distribution. This sweep generates the cyclide. This figure also shows the distribution as being skewed. The median eventually stands up perpendicular to the base place. The purple line equates this standing up of the median as the moment when the distribution has enough data points to no longer be skewed. The distribution would finally be normal. They cyclide would then be a torus. The short tail radius would have to grow, and the long tail radius would have to shrink.

So how does a multidimensional normal end up with a two dimensional distribution and a one dimensional box chart? The box chart shows the aggregation of a lot of information that gets summarized into a single number, the price of the share. Notice that frequency information is encoded in the box chart quartiles, but that is not apparent.

Notice that outliers might extend the range of the dimension. They are not shown. The box chart reflects the market’s knowledge as of the time of purchase. Tomorrow’s information is still unknown. The range of the next day’s information is unknown as well. The number of data points will increase so the distribution could well become normal. But, the increase in the number of data points tomorrow is unknown.

Had we build product and income streams into the long tail, we would be out of luck.

Enjoy.

]]>Now, we have Poisson distributions, small data, and big data. We have hyperbolic spaces, Euclidean spaces, and spherical spaces among many spaces. We have linear spaces and non-linear spaces. We have continuous spaces and finite spaces. Truth is no longer binary. Inference is still based on normal distributions. Those normals become symmetric. Skewness and kurtosis give us long tails and short tails. Pi is variable. And, the total probability mass is tied to pi, so it is also variable running from less than one to more than one.

The number of data points, n, drive our distributions differentially. “We are departing our estimated normal. But, we will be travelling through a skewed normal for a while.” You have to wonder if that probability mass is a gas or a solid. Is the probability mass laid out in layers as the modes and their associated tails move?

It’s too much, but the snapshot view of statistics lets us ignore much, and assume much.

This figure started out as a figure showing what a normal distribution in the Lp geometry looked like when p = 0.5. This is shown in blue. This is a normal in hyperbolic space. The usual normal that we are familiar with happens in L2 space or Lp space where p =2. This is the gray circle that touches the box that surrounds the distribution. That circle is a unit circle of radius 1.

The aqua blue line in the first figure shows the curve of say p=0.1. The figure immediately above shows what happens as p increases, the line approaches and exceeds p=2. At p=1, the line would be straight, and we would have a taxicab geometry. The value of p can exceed p=2. When it does so, the distribution has entered spherical space. The total probability density equals 1 at p=2. It is more than 1 when p<2. It is less than 1 when p>2.

The process starts with that Dirac function where the line goes to infinity. Then, the probability mass flows down into an eventual normal. That eventual normal travers across the Lp geometries. The geometry is hyperbolic until the Lp geometry reaches L2, where p=2. The total probability mass is more than one. The L2 geometry is the standard normal. In the L2 geometry, the total probability mass is one. Then the Lp geometry exceeds p=2. This is the spherical geometry where the probability mass migrates to the shell of the sphere leaving the core empty. At this point the total probability mass is less than one.

Notice that the early adopter phase of the technology adoption lifecycle happens when the total probability mass is more than one. And, the late mainstreet and later phases happen when the total probability is less than one. These changes in geometry mess with our understanding of our financial projections. That early adopter phase is relative to discontinuous innovations, not continuous innovations as the latter happen in the mainstreet or later phases. That early adopter phase is commonly characterized as being risky, but this happens because hyperbolic spaces suppress projections of future dollars, and the problems of investing in skewed distribution where the long tails contract while the short tails remain anchored. The probability mass being more than one with us assuming it is one has us understating the probabilities of success. Our assumptions have us walking away from nice upsides.

All these changes happen as the number of data points, n, increases.

The distribution started when we asserted the existence of a stochastic variable. This action puts a Dirac function on the red plus sign that sits at the initial origin, (0,0) of the unit circle of the distribution. This value for the origin at this n=0, should appear in black, which is used here to encode the observable values of the parameter.

Watch the following animation. It shows how the footprint of the distribution changes as n increases. The distribution comes into existence and then traverses the geometries from the origin to the distant shell of the eventual sphere. This animation shows how the normal achieves the unit circle once it begins life from the origin, and traverses from hyperbolic space to Euclidean space.

In the very first figure, the light gray text lists our assumptions. The darker gray text is observations from the figure. The origin and the radius are such observables. The red text are implied values. We are assume a normal, so the mean and the standard deviation are implied from that. The black text are knowns given that the distribution is in hyperbolic space.

The color codes are a mess. It really comes down to assertions cascading into other assertions.

The thick red circle shows us where the sample of the means happens as n increases. We have a theoretical mean for the location of the origin that needs to be replaced by an actual mean. Likewise, we have a a theoretical standard deviation. That standard deviation controls the sized of the distribution, which will move until normality is achieved in each of the two underlying dimensions. Notice that we have not specified the dimensions. And, those dimensions are shown here as no having skew. We assumed the normal has already achieved normality.

OK. So what?

We here about p-hunting and the lack of the statistical significance parameters actually representing anything about the inferences being made these days. But, hyperbolic spaces are different in terms of inference. The inference parameter of **α **and** β **are not sufficient in hyperbolic space as illustrated in the following figures.

In the figures, I did not specify the **α **and** β** values. The red areas would be those specified by the **α **and** β** values so they would be smaller than the areas shown. I’ll assume that the appropriate value were used. But in the first diagram, there would be statistical significance where there is no data at all. In the second diagram, the statistical significance would again be based on the asserted normal, but results would still include some data from the hyperbolic tails but not much.

The orientation of the tails would matter in these inferences. That requires more than a snapshot view. The short tails of a given dimension orients the distribution before normality is achieved. Given the dependence of this orientation on the mode and given that a normal distribution has many modes over its life, orientation is a hard problem. Yes, asserting normality eliminates many difficulties, but it hids much as well.

As product managers, we assume much. Taking a differential view will help us make valid inferences. And, betting on the short tails, not the long tails will save us time and effort. We do most of our work these days in the late mainstreet or later phases. Statistics is actually on our side because the probabilities are higher than we know, and multiple pathways or geodesics that we can follow.

Enjoy.

The link in that blog post led to another blog post, *“Obesity index: Measuring the fatness of probability distribution tails,”* which gave rise to Rick Wicklen’s comment that mentions quartile-based skewness. Skewness is usually based on moments. Rick’s comment links to another blog post, *“A quantile definition for skewness.”* that discusses quartile-based skewness.

The usual definition of skewness is the Pearson definition, which uses the third derivative as the skewness of the function that represents the distribution, typically a probability density function (pdf) or a cumulative density function (cdf). The zeroth derivative of the distribution is the total probability. The first derivative is the mean. The second derivative is the variance. The third derivative is skewness. The fourth is the kurtosis. There are two more moments following those. And, there can be more moments beyond those. Or, just say there are moments all the way down, down to zero, if we are that lucky.

The quartile definition of skewness bypasses all that calculus. The post on the quartile definition of skewness is known as Bowley or Galton skewness.

Keep in mind, throughout this post, that we are talking about a skewed normal. The skewed normal is asymmetric. The individual quartiles are not unit measures. The range is divided by four, but the divisions for the skewed normal are not uniform. So I drew the quartiles as having a random median (Q2). Then, that leaves us with Q0, Q1 and Q3, Q4 as being arbitrary. Once the distribution achieves normality, the quartiles would have the same widths, the distribution would be symmetrical, the skew would be zero and, the mean, median, and mode would converge to the same value.

You may have seen these quartiles expressed as a box-whisker plot.

These box-whisker plots show up in daily stock price graphs. That hints towards investment considerations. We won’t use that language here.

What we need is the quartile correlations in box-whisker plots so we can move on to the calculation. And, so we can develop some intuitions about skewness and skewed distributions.

A more stock market view of a box-whisker plot follows.

Notice that skewed distributions are described by their medians. A normal, assuming a skewness of 0, is symmetric. Skewed distributions have yet to achieve normality. Once normality is achieved, the mean, the median, and the mode converge to the same value. In the yet to achieve normality, skewed normal, these forms of the average are distinct. The mode and the mean move away from the median and sandwich the median between them. In my posts here, I’ve described the skewed normal as having a median that rests on the mode and lays at some angle based at the mean.

Here I’ve shown the association between the box-whiskers plot and the underlying skewed distribution. I’ve shortened the whiskers portion of the box-whiskers plot so it lines up with the range of the distribution. The tails are at Q_{0} and Q_{4}. In this figure, the short tail is at Q_{0}. In skewed distributions the long tail will be on the opposite side of the median.

On any dimensional axis, the two dimensional projection along the x-axis, the short tail is anchored and the long tail contracts towards the short tail as the distribution achieves normality as the number of data items, n, increases. Once n is greater than one, the projection down to x and y axes, compresses the tails, so the tail on opposite side of the median is longer associated.

Here we see what negative and positive skewed distributions look like. The distribution we associated with our box-whisker plot was positively skewed. Notice the gray dashed lines inside each distribution. They show us what the normals would look once normality is achieved. Again, the short tail is fixed or anchored. The short tails do not move. The long tails contract towards the short tail

The figures don’t show us how normalization would change the shape of the normals. They would be taller. The volumes under the curves would not change as the shape of the distributions change.

But here is the thing, we are investing over time. Over time, the distribution would become symmetrical. Money spent near the short tail would be conserved while money spent out near the long tail would be lost. Functionality serving customers in the long tail would be stranded. Given that we deal with statistics on the basis of snapshot pictures, and that we assume normality, we wouldn’t see why we lost the money and time we spent out on the long tail. We might not realize that our operational hypotheses are no longer valid.

So back to Bawley skewness, one of the quick, no calculus involved ways to calculate skewness.

**Skewness = ((Q**_{3}**-Q**_{2}**)-(Q**_{2}**-Q**_{1}**))/(Q**_{3}**-Q**_{1}**)**

So at the end of this post, we stepped out for a short walk. We walked around a familiar neighborhood and found some interesting things, and we found some confirmations for a few intuitions. Anything can take you to serendipity and surprise. Enjoy.

]]>Throughout the life of a continuously innovating in a discontinuous manner because of the phase structure of the technology adoption lifecycle, it means holding on to people, organizational units, and processes that we will need again or later in the lifecycle without laying them off, or not forgetting and erasing these elements when they are not the focus of what we are doing right now. Companies that forget that they will be doing this again soon end up standing around when their current product has reached the end of its category’s life with nothing to do. It was quite a while before Lotus moved on from their spreadsheet product to their next product. We see Apple doing the waiting right now. It’s something they did after the Mac as well.

Keeping the bowling ally full cuts down on that waiting around.

Attending to your market allocation also helps you see your performance relative to the market, rather than some KPI maximizations and guesses.

I spent a few hours today breaking a market down into the allocations to each competitor given that everyone followed the rule for being a near monopoly. That rule says, never exceed 74% of market share. That is probably a market leader rule. I’ve applied it across all the competitors. In markets where the companies never even get close to a monopolistic position, doing the 74% exercise across all market participants tells you who is operating at 100% and who is not.

I know the market leader gets 74%. This, of course, happens only if they are operating at peak performance. Companies don’t usually do that. But graphing the thing again tell you who is and who isn’t.

The market leader gets 74% of the market assuming the market leader is operating well enough to achieve that 74%. This means that the market leader owns the market, defines it, establishes standards–doing all the other market owner duties. Those duties include seeing to it that the value chain members are thriving, and the employees in the companies comprising that value chain are having viable, long-lasting careers. The market leader tends to the economic wealth of the category. They don’t take it all. They don’t usurp functionality built by members of their third-party vendor network. The market leader has to evolve some sense of the accounting of the category, an accounting reaching beyond their firm. And, to hit that 74% number they have to be operating at peak production. There are two accountings: the inter-entity accounting across the value chain, and the intra-entity accounting of the firm.

The market leader leaves 26% for everyone else. In the figure, I’ve applied that 74% share across all competitors. That’s not realistic. But, instead of using that number, you can use the market leaders real numbers, and the real numbers of all competitors. But, those numbers are lagging indicators. Use the 74% share as an ideal. Then, once the real numbers come in, graph the situation again.

The company in the number two position in the market will take 74% or the remaining 26%. They get 19% of the market. The third company in the market gets 5% of the market. And, the fourth company gets 1%. The rest of the companies get to a piece of the remaining market something slightly more than half a percent. This might seem awful, but to latecomers go the spoils. Still, is the total market in the $B or $T. Then, there will be many more companies that can be included, before market share runs out.

In the discontinuous innovation situation, the market leader defines the category. The first five companies fight it out for market leadership, so they all contribute to how the category does business. The bowling ally can provide you with the seats and dollars you need to be the market leader as soon as you enter the tornado. Getting through the bowling ally, across the carrier-carried content flip that happens when you go from the verticals to the IT horizontal, and across the other changes in operational focus associated with the phase change are all critical to making as the market leader in the IT horizontal.

One last comment–the near-monopoly allocations are an upper limit. In later non-monopolistic markets, the competitors will not exceed the envelope defined by the near monopolistic allocations. If 20% of the market leader is taken by the nearest competitor, you are still talking a 54%/39% split. The market leader still has a significant advantage. In commodity markets, the numbers are much closer, say **46%**/**47%**, the market leader role doesn’t move quickly so market leadership can persist even with second place share. And, the competition is not zero-sum between the first and second place competitors. Market share can be gained and lost across the entire long tail.

Enjoy!

]]>Eventually, SEO got away from most of us and ended up in the hands of SEO experts. And, in the race, print marketing communications went out of style, and many printers went out of business. Paper was weight, weight was expensive, and cost justifications that server logs gave the SEO crowd were nonexistent for print. It wasn’t the internet that disrupted print, it was server logs. Fixing this disruption should have been easy, but the print industry wasn’t listening. We already had the technology. Oh, well.

These days, product managers talk about talking to the customer. Really? What customer? Are we talking prospects, users, managers of users, managers of functional units, or managers of business units? The technology adoption lifecycle defines the person we are selling to differently in different phases. Alternate monetizations drive another set of customer definitions. So who the hell are we supposed to talk to?

With SAAS-based applications, every click can be analyzed via SEO methods to generate a long tail of feature use. We can associate these tails with users and customers across all of our definitions. We could know what the heck as soon as we had a normal distribution for a particular click. Sorry, Agilist, but you don’t have enough data. Much would be seen in the changes to those long tails.

In the earlier, every touchpoint captured data, we can watch prospects mature as we and they crossed the normal distribution of our technology adoption lifecycle. We can watch their onboarding, their use, their learning, their development of their expertise, and their loyalty effects.

Today, I checked out a book that was mentioned on a blog that was mentioned in my tweeter stream. Amazon showed the customer reviews in an interesting manner that I’d not seen before. It broke down the average score into the contributing levels of satisfaction. As it is, this is great for retention efforts, and social networking efforts. It would also be useful in our presale market communications efforts. Prospects are ready to buy only when they reach a 5-star rating across all the marketing communications they’ve touched across the entire buying team.

It would be great in our efforts to develop new features, aka use cases, user stories, and other such efforts. We could push this further by capturing negative reviews, which when tied to the application’s long tail and the individual customer would tell us what we needed to do to retain the customer across all definitions of the customer. If a customer that gave us rave reviews suddenly isn’t, it wouldn’t be sudden if we were paying attention, and it wouldn’t have to end with a retention effort. There is a long tail of customers, not just SKUs. In a software application, every feature is an SKU.

All of this would require an infrastructure that more widely defined what we captured in our server log and what analytic equivalence would look like in all these uses beyond SEM.

In the adoption lifecycle, we could break down the clicks from every pragmatism slice. That would tell us how soon a given pragmatism slice would be ready to buy, and that would inform the marketing communications effort and the feature development effort. We’d know what that pragmatism slice wanted and when. We’d know how well our marketing communications is working for that slice. It would greatly inform our tradeoffs.

One last thing, customers don’t know what we need them to know, so they can’t tell us about the future. Without good definitions of the generic customer, we could be talking to the wrong customer and addressing the wrong issues. We could be taking value to a particular “customer” that would never care about the delivery of that value.

Enjoy.

]]>Yes, there is a process for inscribing a quadrilateral. The process works. But, more often than not, it does not work. But, just for the fun of it. Draw a quadrilateral and only the quadrilateral. Draw it on a blank sheet of paper.

The procedure is to find the midpoints of the segments. Then, draw lines from the opposite midpoints. The intersection of those two lines would be the center of the circle inscribing the quadrilateral. Try it!

Theoretically, of course.

Because you tried it and failed. Keep trying. Keep failing. Call it risky. Conclude that inscribing a quadrilateral is risky. Then, go forward with something simpler, something more achievable, something more amenable to process, something simpler to invest in.

But there is a cheat. Draw the circle and then draw the quadrilateral with the vertexes on the circle. Yes, you end up with a perfect inscribed quadrilateral. You are in Euclidean space.

In those earlier attempts, you were venturing into an unknown geometric space. But, as a manager who knows their numbers, a manager that cranks out numbers and decisions in L2, in the geometric space of the spreadsheet, you only travel in L2. A safe assumption is that L2 is Euclidean although I’ve seen the Euclidean metric being dropped into a spreadsheet. This was done to promote the Euclidean space from an assumption to an explicit assertion.

I’ll assume that I cannot inscribe a quadrilateral in L2.

I was doing this on paper. I’d need a vector graphics-based drawing application to show you. I’ll give it a shot in MS Paint. We need to see it. And, you probably don’t have a compass in that pocket protector you don’t wear outside the house.

First, I’ll draw the cheat, the circle, the root of all things Euclidean–Well, a lot of Euclidean things. Circles were ideal. Mathematicians love their ideals because they make the calculations simple.

Back in our high-school geometry class, we were told that there are 360 degrees around a circle. If you didn’t believe this, oh, well, you flunked, and you didn’t end up in b-school. Maybe you became an artist instead.

Then we were told that any inscribed triangle with a base passing through the center, aka on the diameter would have a right angle opposite that base. Starting with a circle, this is easy. Starting with an arbitrary right triangle, rather than a circle, means we have to know a lot more before we can inscribe that right triangle. It’s still simple, but it’s more complicated to start with the triangle. And, if it isn’t a right triangle, we can still inscribe it, but being on the diagonal is another matter.

So we have a situation that lets us talk about 360 degrees and another that lets us talk about 90 degrees.

Then, we ditched the circle with the triangle postulate, which told us that a triangle’s angles add up to 180 degrees. We, again, were forced to believe this or face grave consequences.

We went so far as to ditch the right angle as well. But, get this, then, they forced trigonometry on us. Oh, we also got rid of that diagonal as well. So we end up with two more kinds of angles: acute and obtuse. Still, the process to draw a circle around a triangle worked, circumscribing a triangle. But, we then faced an explosion in the kinds of centers.

Did you catch that terminology change? When we start with a circle we inscribe the triangle. When we start with the triangle we circumscribe that triangle. Two different situations. Maybe AutoCAD makes that clear, I don’t know. It matters. It just doesn’t matter yet, so expect an Agilist to jump on the elementary, near one and object to the implementing the further one. Oh, well, we can refactor later.

That explosion of centers illustrates a concept that translates well to the base of a normal distribution. The centers: the centroid, circumcenter, and incenter show up in different orderings for different triangles. For equilateral triangles, the centers are one and the same point. When the normal distribution is indeed normal, all three statistical centers, the mean, median, and mode, show up at the same point.

Moving up to quadrilaterals, we add a point and a line and start to run through the geometries of four-sided things. We can circumscribe squares, rectangles, and all the “pure” forms, the ideals again. But, the defects have us skewered on the fork at the question of the circle first or later.

So we’ll walk through that argument next starting with an inscribed cyclical quadrilateral, we’ll draw the circle first, aka using the Euclidean cheat.

“If you’re given a convex *quadrilateral*, a circle can be *circumscribed* about it if and only if the *quadrilateral* is cyclic.” That quote from Stack Exchange in a Google search. Yes, a cyclic quadrilateral is where started us off with trying to circumscribe a quadrilateral. By assuming it was cyclic, we got it done, but we set ourselves up with the needle from the haystack, so we didn’t have to find the needle. Always bring your own needle to a haystack. That needle is Euclidean space. We assume that if and only if rule when we start with a circle. That center is the center of the circle. That center is not the center of the circle.

We’ll look at the intersection of the diagonals. But, before we get there, notice that none of the angles are right angles, and none of the lines are on a diagonal. They would have to include the center of the circle, that black dot. We’ve already departed Euclidean space.

The intersection of the diagonals is not the correct procedure for finding the center of the circumscribing circle. The process I referred to earlier was one of finding the midpoints of the line and joining the midpoints of the opposites side. I did it in MS Paint. That all by itself introduces errors. But, by in large, the error is larger than the error of MS Paint’s bitmap resolution, aka the error of quantification. We got closer, but still wrong and obviously so.

The math of the thing assumes that the circumscribed circle would be in Euclidean space, so opposing angles, those connected by diagonals in the earlier figure, would add up to 180 degrees. I don’t know the number of degrees, but we can still add them up visually.

The area in question is a matter of is it equal to the area marked c. If we stick with a Euclidean space, the two have to add up to 360 degrees.

Cyclic means that opposing angles add up to 180 degrees. Not being cyclic means we cannot circumscribe the quadrilateral in Euclidean space. Not adding up to 180 degrees means we are short a few degrees, aka we are in hyperbolic space, or we are over a few degrees, aka we are in spherical space. We’ve seen this before with our evolution to the normal distribution.

Keep in mind that we are just drawing. We didn’t measure. We didn’t pull out a protractor. We are not running around trying to add up degrees and seconds. We are not navigators trapped in Mercator geometry. Oh, projection!

As product managers, we project profits in L2, and that is why we never try to innovate discontinuously. We are projecting in L2, but the underlying space is not Euclidean, so those nice numbers don’t tell us the truth. The underlying space is almost never Euclidean.

But, the point of this post is to find out if space is hyperbolic or spherical. If the sum of the opposite angles on B where more, then we are hyperbolic in one dimension and spherical in the other. Then, we have to know where the transition happens. Then, we would like to know what the rates of transition would be.

We see the same thing in our data collection towards a normal. We are asking the same questions. There we can see our skew and kurtosis and our rates of data collection. I’m not p-hunting. I’m hunting for my normal. I’m hunting for my short tail so I can invest with some expectations about risk, about stability. Be warned there are plenty of short tails in sense of belief functions in fuzzy logic, in the swamped by surface sense. There is a geologic like structure to probability densities, but we hide those from ourselves with our dataset practice, a practice about preventing p-hunting. We are not p-hunting. We are looking for investments hiding in “bad” numbers, numbers that appear bad because we insist on L2, thus we insist on the 10th-grade Euclidean space of high school. Nowadays, even that space is not strictly Euclidean.

I’ve hinted at heterogeneous spaces. Trying to circumscribe a freely drawn quadrilateral reveals how space transitions to and from geometries generating homogeneous spaces.

“Because in reality there does not exist isolated homogeneous spaces, but a

mixtureof them, interconnected, and each having a …”

a quote in the google search results citing *An Introduction to the Smarandche Geometries*, by L. Kuciuk and M. Antholy.

We do business in those spaces, spaces where the ideal, the generic, are fictions. Discontinuous innovation happens in hyperbolic space. Continuous innovation happens in Euclidean and spherical spaces with the spherical being the safest bet. And, that hyperbolic space being the riskiest. We no longer invest in discontinuous innovation because we believe it is risky. It appears that such investments would offer little return because in hyperbolic space the future looks smaller than it will be.

And, in the sense of running a business, “One geometry cannot be more valid than another; it can only be more convenient.”*Henri Poincaré (1854 – 1912), Science and Hypothesis (1901).*

So at last, we will encircle an indigenous quadrilateral in the wild. This is probably not a good example since it is concave. The circles generate a conic, but the points of the quadrilateral that are on the circles hint at a more complex shape. The circles hint that the geometry changes when we move from one circle to another. The smallest circle give us three spherical geometries; the largest circle, only one. The smallest circle gives us no hyperbolic geometries; the largest circle, two hyperbolic geometries.

Given that most companies work in late mainstreet adoption phase selling commodities from within a spherical space, we rarely dip into the hyperbolic space, except when we undertake a sampling effort towards the normal. In that effort, we might jump the gun and infer before we are back in Euclidean or our more native spherical space. So much for that inference. It will be fleeting. Know where you are Euclidean. Is it a normal with three sigma, or a normal with six sigma? It isn’t a normal with twenty sigmas. Your choice. It is not Euclidean when it is not yet normal, aka when it is skewed and kurtotic.

It makes me wonder if the reproduction crisis is an artifact of inferring too early. A statistician out on Stack Exchange insisted that you can infer with kurtotic and skewed distributions. Not me. Is the inference fleeting, a throwdown, or a classroom exercise?

Anyway, back to those rates, those differentials, which takes us to two more diagrams. The first one shows us what happens as we achieve the Euclidean, aka the cyclic; the next one, we achieve the spherical.

Enough already. Thanks. And, enjoy!

]]>

In the above figure, I annotated several events. Gradient descent brought the system to the optimum. Much like a jar of peanut butter getting on the shelf of your grocery store, there was much involved in achieving that optimum. Call that achievement an event.

Here I’ve annotated the intersections as events. On one side of the intersection, the world works one way, and on the other side, the world works another way. The phases of the technology adoption lifecycle are like that. Each phase is a world. In the figure here, all I can say is that the factors have changed their order and scale. These changes apply over an interval. That interval is traversed over time. Then, the next interval is traversed. Consider the intervals to be defined by their own logic. The transitions can be jarring. As to the meanings of these worlds, I’ll have to know more before I can let you know.

John Cook tweeted about a post on estimating the Poisson distribution from a normal. That’s backward in my thinking. You start collecting data, which initially gives you a Poisson distribution, then you approximate the normal long before normality is achieved. Anyway, Cook’s post led me to this post, *“Normal approximation to logistic distribution”. *And, again we are approximating the logistic distribution with a normal. I took his figure used it to summarize the effects of changes to the standard deviation, aka the variance.

The orange circles are not accurate. They represent the extrinsic curvature of the tail. The circle on the right should be slightly larger than the circle on the left. The curvature is the inverse of the radius. The standard deviations are 1.8 for the approximating normal on the left, and 1.6 for the approximating normal on the left. The logistic distribution is the same in both figures.

On the left, the approximation is loose and leaves a large area at the top between the logistic distribution and the approximating normal. As the standard deviation is decreased to the optimal of 1.6, that area is filled with some probability mass that migrated from the tails. That changes the shape of the tail. I do not have the means to accurately compute the tails accurately so I won’t speak to that issue. I draw to discover things.

The logistic distribution is symmetric. And, the normal that Cook is using is likewise symmetric. We are computing these distributions based on their formulas, not on data collection over time. In my earlier discussions of kurtosis, we know that while data is being collected over time, kurtosis goes to zero. That gives us these ideal distributions, but in the approximation process much is assumed. Usually, distributions are built around the assumptions of a mean of zero and a standard distribution of one. I came across a generalization of the normal that used skew as a parameter.

It turns out that the logistic distribution is subject to a similar generalization. In this generalization, skew, or the third moment is used as a parameter. These generalizations allow us to use the distributions in the absence of data.

Skew brings kurtosis with it.

In the first article cited in this post, the one that mentions Bayes, a Bayesian inference is seen as a series of distributions that arrive at a truth in a Douglas MacArthur island hopping exercise, or playing a game of Go where the intersections are distributions. It’s all dynamic and differential, rather than static in the dataset view that was created to prevent p-hunting, yet p-hunting has become the practice.

So these generalizations found skew to be an important departure from the ungeneralized forms. So we can look at these kurtotic forms of the logistic distribution.

Here shown in black we can see the ungeneralized form of the logistic distribution. It has two parameters: the mean, and the standard distribution. The generalization adds a third parameter, skew. The red distribution has a fractional skew that is less than one. The blue distribution has a skew greater than one. Kurtosis is multiplicative in this distribution. The kurtosis orients the red and blue distributions via their long and short tails. Having a long tail and a short tail is the visual characteristic of kurtosis. Kurtotic distributions are not symmetrical.

Kurtosis also orients the normal. This is true of both the normal and the generalized skew-normal. In the former, kurtosis is generated by the data. In the latter, kurtosis is generated by the specification of the skew parameter. The latter assumes much.

It would be interesting to watch a skew-normal distribution approximate a skew-logistic distribution.

The three distributions in the last figure illustrate the directionality of the kurtosis. This kurtosis is that of a single dimension. When considered in the sense of an asymmetrical distribution attempting to achieve symmetry, there is a direction of learning, the direction the distribution must move to achieve symmetry.

We make inferences based on the tails involved. Over time the long tail contracts and the short tail lengthens. Statisticians argue that you can infer with kurtotic distributions. I don’t know that I would. I’d bet on the short tails. The long tails will invalidate themselves as more data is collected. The short tails will be constant over the eventual maturity, the differential achievement of symmetry, or the learning of the distribution.

This learning can be achieved when product developers learn the content of their product and make it fit the cognitive models of their users, or when marcom, training, and documentation enable users to learn the product, and lastly, changing the population so its members more closely fit the idealized population served by the product. All three of these learnings happen simultaneously, and optimally without conflict. Each undertaking would require data collection. And, the shape of the distribution of that data would inform us as to our achievement of symmetry, or the success and failure of our efforts.

The technology adoption lifecycle informs us as to the phase, or our interval and its underlying logic. That lifecycle can move us away from symmetry. We have to learn back our symmetry. The pragmatism that organizes that lifecycle also has effects within a phase. This leaves us in a situation where our prospects are unlike our customers or installed users. Learning is constant so divergence from symmetry is also constant. We cannot be our pasts. We must be our present. That is hard to achieve given the lagging indications of our distributions.

Enjoy!

]]>When I wrote Yes or No in the Core and Tails III, the variance in the was obvious in the diagram on minimization in machine learning, but the bias was not. I had thought all along that not filling the entire tree should have made the distribution skewed and kurtotic. But the threshold to having a normal distribution is so big, 2^{11}, that we are effectively dividing the skew and kurtosis numbers by 11, or more generally by the number of tiers in the binary tree. That makes the skew and kurtosis negligible. So we are talking about 248/2048=0.1211.

Enjoy.

]]>