Archive for December, 2017

Pragmatism

December 10, 2017

Pragmatism organizes the technology adoption lifecycle (TALC). While the TALC is usually represented by the normal distribution summed into the normal we use to summarize what’s going on. We see the phases, the larger scale pragmatism outcomes. Not the smaller scale pragmatism outcomes within the phases, the pragmatism slices.

To begin in the beginning, we illuminate when we don’t have a sensor that can detect a signal. Otherwise, we go straight to the sensor, which gives us data in some range. We might have to clean it up. For a normal distribution or a Poisson distribution, we count up how often a value occurred, or the arrivals of values.

Eventually, we end up with a distribution or an envelope for randomness. That distribution houses the “noise.” We captured the data points. We summarize the data points into parameters that determine the shape of the distribution we are using to summarize our data. We make a standard normal with just two parameters: the mean, and the standard deviation. With three pairs of numbers, we have the three normals of the TALC covered.

The TALC is a system built on noise. Yes, sorry, but sales is a random process. Marketing likes to think of itself as a methodical organization. Marketing discovers prospects, nurtures prospects, uncovers the buying process and the participants in the buy, and once the nurturing process moves all of those participants into the “I want this,” state, they set the appointment for the sales rep. Then, sales throws that lead in the trash.

While marketing was busy with all of that, sales picked up the phone and random walked themselves to revenue. And, finally, having sold, management tells the sales rep that they can’t do the deal because the prospect is an outlier. Just another day in the war between marketing and sales.

The TALC is anything but random. The TALC is a highly organized stochastic system. It’s like a radar. A radar sends out noise in a given distribution, a physical one. Only the frequencies that fit in the pipe make it to the antenna where they are transmitted. Then, they bounce off stuff and get back to the antenna where they again have to fit in the pipe. Outliers are trashed. In a company, that outlier prospect moves the population mean too far at too high a cost, so the company refuses to sell to them right now. A few years from now that too far at too high a cost problem was so yesterday.

Marketing already knew that. But, marketing is not random. Marketing has to be pragmatic when it faces a population organized by pragmatism. All that population wants is a business case that makes the buy reasonable. Reasonable is the real organizer. Jones bought this and got a hell of a success from it. But, you know us, we are not like Jones at all. Jones is an early adopter. We wait, not long, but we wait. We want to see the successes of businesses like ours. Jones is too early for our tastes. Just like sales is too early.

That the TALC is based on a summed set of normal distributions doesn’t help either. Those normals make this a stochastic system. The prospects do a random walk towards up. And, we do a random walk to out qualified prospects. “Qualified” filters those prospects. But, so does pragmatism.

I read across the “Markov Chains: Why Walk When You Can Flow?” blog post on the Elements of Evolutionary Anthropology blog. Twitter random walks all of us. This post is about random walks.

The author started with an application demonstrating a random walk under a normal distribution. He shows the next attempted step in the random walk with a vector that is either red for failure or green for success. When the vector is green, the next step is taken, which results in a new data point being added to the distribution. When the vector is red, the data point is not added to the distribution, and another step is attempted.

Random Walk Metropolis-Hastings Normal

I annotated the author’s figure to show where the outliers sit, the Markov chain underlying the Metropolis-Hastings random walk and the TALC phases.

 

Random Walk Metropolis-Hastings Normal Annotated

On the y-axis normal, I indicated where the data generated by the random walk are either over or under the expected frequencies. Then, I added a hypothetical path via the green vectors. I colored the outliers in gold, but later I realized that there were more outliers beyond the six sigmas of the normal representing the talk. I used the red circle to divide the additional outliers from the non-outlier tail of the normal.

Then, I labeled the TALC. That labeling might be unfamiliar. From the left, EA is the early adopter; C is the Chasm; V is A vertical market. The bowling alley (BA) is comprised of the early adopter and their vertical. The Chasm guards entry into the vertical. The technical enthusiasts are present across the TALC, not just at the beginning, so they have their layer. Their layer included the cloud form-factor (C) as part of the technical enthusiast layer. This population was formerly considered to be phobics (P) or non-adopters, but the disappearance of the technology and admin-free/infrastructural, aka somebody else’s problem presentation fits the needs of phobics. Then starting at the right again after the vertical phase, at the tornado (T), enter the early mainstreet (EM), otherwise called the horizontal (H) or IT horizontal phase. Next, we enter the late mainstreet (LM), otherwise called the consumer phase. We exit the late mainstreet one of three ways: the M&Athrough a second tornado (T), or by moving through or to the form factors of the device (D) phase, and the cloud (C) phase. NA here means non-adopter.

We may extend the life of the category by going down market. The gray outermost circle represents the extent of the down market move.  This is where Christensen disruptions live, in the down market. They live elsewhere as well, but all of them are firmly anchored in the late mainstreet or consumer phase. Foster disruptions require discontinuous invention and innovation prior to the technical enthusiast phase.

I further illustrated progress through the TALC with thick red and blue arrows. Discontinuous innovations need the full pathway starting with the technical enthusiasts (TE) phase. Continuous innovation can start anywhere. These days it is typical to be in late mainstreet (LM) leaving a lot of money on the table, but the VCs investing there only know that phase, so they do not reap the returns that paid for everyone else. Cash is the game in the late mainstreet. B-schools preach the late mainstreet with its steady long-term commodities and the sport of competition.

The extent of the downmarket is shown with the light blue horizontal lines and the angled line that denotes the end of the category. The line the company going downmarket ends up on depends on how far downmarket they went. The end of the category depends on the extent of the downmarket move as well.

The author talks about the efficiency of the next step in the Markov path and how one explores only the areas under the normal that need to be explored. So his next figure takes a random walk around a narrow ring under the normal.

Random Walk Metropolis-Hastings Ring

In this figure, you see one phase of the TALC being rotated around under the normal. This would be the technical enthusiasts in their phase and the phobic or cloud phase. We find the next data point less often, less frequently, but the frequency of a given data point would be the same if a normal was used, but the overall process is faster when the area being explored is smaller.

Random Walk Metropolis-Hastings Ring Annotated

So the math works out t be A=π(R^2-r^2) vs A=πR^2, which means that the ring does not take as long to compute. But, in a stochastic system, the random number generator knows nothing of rings, so many numbers get generated and disposed of unused.  Smaller targets are harder to hit.
I annotated this one as well. There is a lot going on in that ring.
Random Walk Metropolis-Hastings Ring Annotated
The normal distribution in the ring is a circular normal. With a non-circular normal, the normal would be skewed until the density was consistent throughout the ring. That the distribution is not normal across the entire topology leaves us with skew and kurtosis. For the time being the distribution is trinominal. And, those uninominal are interspersed with Poisson arrivals that eventually tend to and achieve a normal. Those Poisson distributions occur in the still empty areas of the ring.
Again, I’ve color coded the areas under the distribution used as being over and under the frequencies intended by the distribution being used, the target distribution.A Pragmatism Slice
This figure shows us what a pragmatism slice looks like. But, in the TALC, we haven’t gone far enough in defining the target area yet.
A Pragmatism Slice 2
Here I went back to the TALC and focused on the technical enthusiast in the beginning of the TALC (TE) and those last two phases beyond the Late Mainstreet (LM), as in phobic (P) and laggard (L) or the device (D) and cloud (C) phases. There are real differences in mission between the early and late phases. There are real differences between outcomes, as an IPO premium for early phases, and no such thing for late phases. The early TEs play with the technology. The late TEs migrate the product to the new form factors. The late TEs might have to develop a product for the company that eventually acquires the TE’s company. Macromedia developed Captiva to this end. So these different times are looking for very different target populations.
Each pragmatism ring serves different roles in the software as media model. Early TEs play with the carrier. Late TEs play with different form factors, different carriers. Late TEs also distribute components differently as well. Each phase has different expectations and different levels of task sublimation. Task sublimation would be counter to the need of those in the early phases, yet essential to those in the late phases. The generic “task sublimation is good” finding is not so good as a generic piece of advice. Likewise design, or the notion that dot 1.0 functionality was awful. No, it wasn’t awful. It served geeks just fine. We didn’t ask developers to respect the carried domain, and really, we still don’t. Observation and asking questions is insufficient for what needs to be achieved.
The functionality problem is still with us, unsolved. Hiring UX developers still leave the non-UX developers to code their functionality as they please. They still don’t do UX.
Those Poisson games played during the search for the next technology, those Poisson distributions show up throughout the TALC any time when we don’t have a valid sample or a normal free of skew and kurtosis.
Attend to your pragmatism slices. Don’t jump ahead then jump back. You moved your normal. They don’t go back well. Ask the next slice, the prospect slice, about what they need. Do this independent of your install base, your customers. Those prospects will need something different from your customers. You might as well have different lists for each of those slices. The carrier and the carried slices would be different as well. The carried and carrier code really can’t be written by the same developers. The disciplines being coded are too different. The carrier is easier than the carried. In general, we mess carried up. We know carrier. That’s where most developers live.
Way back in the nascent internet days, a developer was all hot to write an electronic store, but when I asked him if he had ever worked in a store, he said no. He was enamored with the carrier and thought the carried would be easy. Sorry, but stores have managers that live stores. A database developer lives databases. A database can be a nice metaphor for a store, but that poetry, not a store.
Enjoy your pragmatism slices. Don’t turn them into onions.
And, click the link and read the blog post. I haven’t read the whole thing yet.

 

 

Advertisements

From Time Series to Machine Learning

December 4, 2017

This post, “Notes and Thoughts on Clustering,” on the Ayasdi blog brought me back to some reading I had done a few weeks ago about clustering. It was my kind of thing. I took a time series view of the process. Another post on the same blog, “The Trust Challenge–Why Explainable AI is NOT Enough,” boils down to knowing why the machine learning application leaned what it did, and where it went wrong. Or, to make it simpler, why did the weights change. Those weights change over time, hence the involvement of time series. Clustering changes, likewise, in various ways as n, n as time, changes, again time series is involved.

Time is what blew those supposed random mortgage packages up. The mortgages were temporally tied linked, not random. That was the problem.

In old 80’s style expert systems, the heuristics were mathematics, so for most of us the rules, the knowledge was not transparent to the users. When you built one, you could test it and read it. It couldn’t explain itself, but you could or someone could. This situation fit rules 34006 and 32,***. This is what we cannot do today. The learning is statistical, but not so transparent, not even to itself. ML cannot explain why it learned what it did. So now there is an effort to get ML to explain itself.

Lately, I’ve been looking at time series in ordinary statistics. When you have less than 36 data points the normal is a bad representation. The standard deviations expand and contract depending on where the next data point is. And, the same data point moves the mean. Then, there is skew and kurtosis. In finance class, there is skew risk and kurtosis risk. I don’t see statistics as necessarily a snapshot thing, only done once you have a mass of data. Acquiring a customer happens one customer at a time in the early days of a discontinuous innovation in the bowling alley. We just didn’t have the computing power in the past to animate distributions over time or by each data point. We were asked to shift to the Poisson distribution until we were normal. That works very well because the underlying geometry is hyperbolic explaining why investors won’t put money on those innovations. The projects into the future get smaller and smaller the further out you go. The geometry hides the win.

It turns out there is much to see. See the “Moving Mean” section in the “Normals” post for a normal shifting from n=1 to n=4. Much changes from one data point to the next.

I haven’t demonstrated how clustering changes from one data point to the next. I’ll do that now.

Clustering DP1

At n=1, we have the first data point, DP1. DP1 is the first center of the first cluster, C1. The radius would be the default radius before any iterating that radius to some eventual diameter. It might be that the radius is close to the data point or at r=1.

At the next data point, DP2, it could have the same value as DP1. If so, the cluster will not move. It will remain stationary. The density of the cluster would go up. But, the standard deviation would be undefined.

Or, DP2 would be different from DP1 so the cluster will move and the radius might change. A cluster can handily contain three data points. Don’t expect to have more than one cluster with less than four data points.

Clustering DP2

At n=2, both data points would be in the first cluster. Both could be on the perimeter of the circle. The initial radius would be used before that radius would be iterated. With two points, the data points might sit on the circle at the widest width, which implies that they sit on a line acting as the diameter of the circle, or they could be closer together closer to the poles of the circle or sphere. C2 would be a calculated point, CP2 between the two data points, DP1 and DP2. The center of the cluster moves from C1 to C2, also labeled as moving from DP1 to CP2. The radius did not change. Both data points are on a diameter of the circle, which means they are as far apart as possible.

The first cluster, CL1, is erased. The purple arrow indicates the succession of clusters, from cluster CL1 centered at C1 to cluster CL2 centered at C2.

P1 is the perimeter of cluster CL1. P2 is the perimeter of cluster CL2. It takes a radius and a center to define a cluster. I’ve indicted a hierarchy, a data fusion, with a tree defining each cluster.

With two data points the center, C2 and CP2, would be at the intersection of the lines representing the means of the relevant dimensions. And, there would be a standard deviation for each dimension in the cluster.

New data points inside the cluster can be ignored. The center and radius of the cluster do not need to change to accommodate these subsequent data points. The statistics describing the cluster might change.

A new data point inside the cluster might be on the perimeter of the circle/sphere/cluster. Or, that data point could be made to be on the perimeter by moving the center and enlarging the radius of the cluster.

The new data point inside the cluster could break the cluster into two clusters both with the same radius. That radius could be smaller than the original cluster. Overlapping clusters are to be avoided. All clusters are supposed to have the same radius. In the n=3, situation, one cluster would contain one data point, and a second cluster would contain two data points.

A new data point outside the current cluster would increase the radius of the cluster or divide into two clusters. Again, both clusters would have the same radius. That radius might be smaller than the original cluster.

Clustering DP3

With n=3, the center of the new cluster, C3, is located at CP3. CP3 would be on the perimeter of the cluster formerly associated with the first data point, DP1. The purple arrows indicate the overall movement of the centers. The purple numbers indicate the sequence of the arrows/vectors. We measure radius 3 from the perimeter of the third cluster and associate that with CP3, the computed center point of the third cluster, CL3.

Notice that the first cluster no longer exists and was erased, but remains in the illustration in outline form. The data point DP1 of the first cluster and the meta-data associated with that point are still relevant. The second cluster has been superseded as well but was retained in the illustration to show the direction of movement. The second cluster retains its original coloring.

Throughout this sequence of illustrations, I’ve indicated that the definition of distance is left to a metric function in each frame of the sequence. These days, I think of distributions prior to the normal as operating in hyperbolic space; at the normal, the underlying space becomes Euclidean; and beyond the normal, the underlying space becomes spherical. I’m not that deep into clustering yet, but n drives much.

Data points DP1 and DP2 did not move when the cluster moved to include DP3. This does not seem possible unless DP1 and DP2 were not on a diameter of the second cluster. I just don’t have the tools to verify this one way or another.

The distance between the original cluster and the second was large. The distance is much smaller between the second and third clusters.

This is the process, in general, that is used to cluster those large datasets and their snapshot view. Real clustering is very iterative and calculation intensive. Try to do your analysis with data that is normal. Test for normalcy.

When I got to the fourth data point, our single cluster got divided into two clusters. I ran of time revising that figure to present the next clusters in another frame of our annimation. I’ll revise the post at a later date.

More to the point an animated view is a part of achieving transparency in machine learning. I wouldn’t have enjoyed trying to see the effects of throwing one more assertion into Prolog and trying to figure out what it concluded after that.

Enjoy.