June 12, 2016

In Graphs and Distributions, I mentioned that I was struggling with an idea that didn’t pan out. Well, the donut was the troublesome idea. I finally, found an explanation of why hypothesis testing doesn’t give us a donut. The null hypothesis contributes the alpha value, a radius of the null, of the test. And, the alternative hypothesis contributes the beta value, a radius of the alternative, of the test. You end up with a lense, a math term, hence the spelling. Rotating that lense gives you the donut, as I originally conceived it.

In the process of trying to validate the donut idea, I read and watched many explanations of hypothesis testing. I looked into skew and kurtosis as well. I’ve mashed it up and put into a single, probably overloaded diagram.

Donut 2

Here we have two normal separate by some distance between their means as seen from above looking down. We test hypotheses to determine if a correlation is statistically significant. While correlation is not causation, causation would be a vector from the mean of one normal to the mean of another. The distance between the means creates statistical significance. Remember that statistics is all about distance.

In hypothesis testing, you set alpha, but you calculate beta. Alpha controls the probability of a false positive or type I error. Alpha rejects the tail and accepts the shoulder and core, shown in orange. Beta rejects the core and some portion of the shoulder towards the core or center, shown in yellow. Alpha and beta generate the lense shape, shown in green, representing the area where the alternative hypothesis is accepted.

I drew the core touching the lense. This may not be the case. But, two authors/presenters stated that in hypothesis testing, the tails are the focus of the effort and the core is largely undifferentiated, aka not informative.

Then, I went on to skew and kurtosis. Skew moves the core. Kurtosis tells us about the shoulder and tail. The steeper and narrower the shoulder, the shorter the tail. This kurtosis is referred to as light. The shallower and wider shoulder, the longer the tail. This kurtosis is referred to as heavy. Skewness is about location relative to the x-axis. Since the top down view is not typical in statistics, the need for a y- or z-axis kurtosis parameter gets lost–at least at the amateur level of statistics, aka the 101 intro class.  On the diagram, the brown double-ended arrow should reach the across the entire circle representing the footprint of the distribution.

Kurtosis and Tails

The volume under the shoulders and tails sum to the same value. The allocation of the variance is different, but the amount of variance is the same.

One of the papers I read in regards to kurtosis can be found here. This author took on the typical focus of kurtosis as defining core by looking at the actual parameters, parameters about tails, to conclude that kurtosis is about tails.

Notice also that the word shoulder cropped up. I first heard of shoulders in the research into kurtosis. Kurtosis defines the shape of the shoulders. As such, it would have effects on the distribution similar to that of black swans. It changes the shape of the distribution at the shoulders and tails. Tails, further, are not the same when the distribution is skewed, but somehow this is overlooked, because there is only one skew parameter, rather than two or more. This leaves an open question as to what would change the kurtosis over time. The accumulation of data over time would change the skew and kurtosis of the distribution.

Where black swans shorten tails by moving the x-axis up or down the y-axis, kurtosis changes would happen when the probability mass moves to and from the shoulders and tails.

Regression generates a function defined as a line intersecting the mean. In the multivariate normal, there are several regressions contributing to the coverage of the variance under the normal. These regressions convert formerly stochastic variations into deterministic values. Factor analysis and principal component analysis all achieve this conversion of stochastic variation into deterministic or algebraic values. These methods consume variance.

Due to the focus of hypothesis testing being in the tails, core variance is consumed or shifted towards the tails. Alpha defines an epsilon value for the limit of the normal convergence with the x-axis. Alpha essentially says that if the value is smaller than alpha, ignore it, or reject it. Alpha is effectively a black swan.

Since a factor analysis discovers the largest factor first, and increasing smaller factors as the analysis continues, it constantly pushes variance towards the bottom of the analysis. The factor analysis also acts as an epsilon limiting convergence with the x-axis again, because we typically stop the factor analysis before we’ve consumed all the variance. We are left with a layer of determinism riding on top of a layer of the stochastic or variance. Bayesian statistics uncovers the deterministic as well.

To Tails

A radar is basically a bunch of deterministic plumbing for the stochastic and some mechanisms for converting the shape of the stochastic into deterministic values. This layering of determinism and stochastic is typical.

One term that showed up in the discussion of skewness was direction. Note that they are not talking about direction in the sense of a Markov chain. The Markov chain is a vector representing causation where skewness does not represent causation.

The takeaway here should be that changes in skew and kurtosis will require us to retest our hypotheses just like the retesting caused by black swans. Data collection is more effective in the tails and shoulders than in the core if your intent is to discover impacts, rather than confirm prior conclusions.

Comments are welcome. Questions? What do you think about this? Thanks.






User Stories

May 9, 2016

Before the internet, back when geeks wrote software that they later sold to geeks, there was functionality. Designers today comment about how bad it was. There was no UX. There were no HF people or UI designers or designers of any kind (aka art people). Well, there were software designers (geeks). There were geeks and there were economic buyers. Those economic buyers were not users, but bosses of users usually separated by layers of other bosses.

If you were discontinuous, you didn’t sell to IT, so there was no requirements analyst that later studies found got in the way of collecting requirements, because they insisted that carrier trumped carried.

But somehow software got written and used, software companies sold software, and economic buyers got the competitive advantage they paid for. But, how? Technical writers had to turn functionality into user tasks, trainers had to do task analyses to find out what we now call “the job(s) to be done.” No ethnography was done either. Care was not taken to capture the cognitive model of the users. So, instead, the users were taught how to get from functionality to the tasks or jobs to be done. Users who knew how to think about their jobs and how to do their jobs were taught how devs would think and how devs would do their jobs. Obviously, the mismatch was huge. Unfortunately, the mismatch, the gap, is still there.

Technical writers had to go from a context ID referring to a particular dialog to a task, but only one of the tasks that could be done in the dialog or through it. There was no one dialog one task rule back then. So some tasks fell through the context ID gap. Even today a context ID does not refer to a user story.

Sales applied the feature-advantage-benefit (FAB) framework. Benefit translates to task/user story/job to be done. Sale reps can turn any feature into a FAB statement. Back in the day, everyone compensated for the developer. The developers didn’t notice most of the time.

Technical writers could turn any feature into a task. I remember one particular task in a manual, “Using Independent Disk Pools.” Beware. “Using” is not a task, and this task is a fake task. No user woke up in the morning thinking, “Hey, I get to use the independent disk pools feature today.” No. They woke up thinking, “Hey, I’ve got to set up geo mirroring today. Setting up is a real task.

User stories can fail just like those feature to task conversions.

Agile succeeded in the internal IT context. But, when Agile escaped that context, I can’t say either way. The one vendor I worked with that was Agile, failed. There was no communications outside the dev team and that was much smaller than the span of control of the VP of Dev. Other people still depend on clear communications about what is getting done. Agile made developers even more reclusive.

So when I encountered the user story tweet, I had had enough of the Agile evangelist. I need to know what the size of the typical deliverable will be, and when it will show up. If you can’t tell me that, you won’t be on my team. Agile, DevOps, this method, that don’t help me even if they make Agilists artists and uninterested in money.

Somehow Agilists are supposed to be ethnographers, marketers, and managers while keeping their coding skills up to date. The point of all the people making up the rest of the firm is specialization, knowledge, silos and all those things that make dev hard. Sorry, but I need devs that can code. I don’t believe they can do it all. Just like I don’t believe in the business generalist. Yes, your bosses boss took your job 101 back in college. He knows it all. Hardly.

But, what about using user stories when developing architecture? What about that “ideal architecture?” There is no ideal. There is now and there is yesterday. There is today’s users and yesterday’s users. There are users seen through the lens of the technology adoption lifecycle phases. Then, there are users seen through the lens of the pragmatism steps that fragment the phases into tinier populations.

That pragmatism is a problem for marketing, sales, development and everyone else. When we operate on a scale wider than the pragmatism step, we tend to mix the populations and smudge the addressability and cognitive fitness of the software, marketing, and ultimately, the firm itself. On a pragmatism step, the population references its own members. The early adopters are not in the reference group. They are too weird, too early. The early adopters are on a different step, so their opinions and results don’t matter at all.

People in firms are on different pragmatism steps, so firms are respectively on different pragmatism steps. The people in firms refer to each other to the degree that they are on the same step. Likewise, firms tend to show up at tradeshows with the other firms in their reference base.

This makes the ideal a difficult proposition. A developer could get stuck on a particular pragmatism step. That developer could be very responsive to those users, which just serves to isolate the developer and the application they are developing.

Sales has to address the step of the current prospects. Marketing has to address the steps of the retained, incumbent customer, the step of the installing customer, and the step of the prospect. Way too much is going on. Likewise, the developer can’t just sum the distributions and hope for the best. The segmentations must be preserved.

Those pragmatism steps do give a product manager the ability to price the app each independent population differently. Each group would never hear what the other group is paying since they don’t reference each other.

Aspect-oriented programming can help with all these segmentations by taking the segmentation into the code itself.

I’m very much a no tradeoffs, no averaging of functionality, no gaps in the cognitive models, and one use case to one feature person. Stop the smudging. Stop the averaging. Stop the trading off. Alas, I want too much.

Architecture is a developer facing thing. Developers are the users of an architecture. Much of the ease of what developers do these days is due to the underlying architecture. The easier it is to do, the longer its been around in the stack. The longer its been in the stack, the less likely its got a user story written for it.

Much of what developers do these days is about coding carrier functionality. Drawing the lines between carrier and carried is difficult, but it gets harder when you’re drawing the line between carrier and the next layers of the carrier stack. Different populations own different portions of the stack, so there are different terminologies, cultures, perspectives, points of view. The user in the stack is a developer, but a different developer. Who is doing this? The 101 guy or the PhD in this? The developers that think their developer users are just like them are in for a shock. In the old days we could write an API and not worry about it being copied as long as it was easier to use than write or rewrite.

A clear definition of the user is essential. The user story is just part of getting to that clear definition. Keep in mind that form is not the issue.

There is the expert’s cognitive model. It has overcome the all the plateaus. Each performance plateau constitutes a segmentation of the population. Not every use has encountered the trick to get beyond this or that plateau. Not everyone is an expert. An application built on that segmented cognitive model will also have to deal with the transitions between those levels of expertise. How will your users get from novice to mid-level performance? Where is the ideal here? The segmentation can help you keep ideal limited to one particular scope of the cognitive model.

The pragmatism steps get spit into carried and carrier as well. Architecture gets split likewise, so the pragmatism segmentation plays here as well keeping the carried expert clearly separated from the carrier expert. There are two pragmatism dimensions. Have fun with that.

I know I tweeted about other dimensions of the user story as pathway to the ideal architecture. But, it’s been a while.

It’s probably easier today, since the task analysis actually happens earlier before stuff gets written. The 101 ethnography gets done as well. We observe. We interview. But, we are not ethnographers. Spend the money. Encode the cognitive model. We don’t do that today. Instead, we rely on the developer’s idea and hope a population springs up around it. Lean checks that a population actually emerges. Not everything can be lean. Lean is where we are today on the technology adoption lifecycle. Lean would not have gotten us where we are today. We have the stack. We rely on that stack of ole.

Graphs and Distributions

April 28, 2016

Here I am struggling to make some ideas that looked interesting pan out. I’m starting into week three of writing this post when John D. Cook tweets a link to “Random is a Random Does,” where he reminds us that we are modeling deterministic processes with random variable. I’ve hinted towards this in what I had already written, but in some sense I’ve got it inside out. The point of statistical processes is to make deterministic the deterministic system that we modelled via randomness. I suppose I’ll eat a stochastic sandwich.

Having browsed David Hand’s “The Improbabily Priniciple,”  has me  trying to find a reason why events beyond the distribution’s convergence with the x-axis happen. The author probably proposed one. I have not read it yet.

Instead, I’m proposing my own.

The distribution’s points of convergence delineate the extent of a world. But, even black swans demonstrate why this isn’t so. A black swan moves the x-axis up the y-axis and pulls the rightmost point of covergence closer to the mean, or into the present from some point in the future. If you projected some future payoff near the former convergence, well, that’s toast now. It’s toast, not because the underying asset price just fell, rather the furture was just pulled into the present.

When the x-axis moves up the y-axis, the information, the bits, below the x-axis are lost. The bits could disappear due to the evaluative function and remain in place as a historical artifact. In the real world, the bits under the x-axis are still present. The stack remains. Replacing those bit with bits having an improved valuation is a key to the future. But, the key to getting out beyond the convergence is understanding that there is some determinism that we have not yet modelled with randomness.

While I labeled the area below the x-axis as lost, lets just say it’s outside consideration. It never just vanishes into smoke. Newtonian physics is still with us.

101 Black Swan

A few weeks ago, somebody tweeted a link to “A Random Walks Perspective on Maximizing Satisfaction and Profit” by M. Brand. Brand startled me when he talked about a graph in graph theory as being a collection of distributions. He goes on to say that an undirected graph amounted to a correlation, and a directed graph amounted to a causation. The problem is that the distributions overlap, but graph theory doesn’t hint at that. Actually, the author didn’t says correlation or causation. He used the verbage of symmetric and asymmetric distributions.

So that left me wondering what he meant by asymmetric. Well, he said Markov chains. Why was that so hard? The vector on the directed graph is a Poisson distribution from the departure node to the arrival node, a link in a Markov chain. The cummulative distribution would be centered near the mean of the arrival node, but the tails of the cummulative distribution would be at the outward tails of the underlying distributions. The tail over the departure node would be long, and the tail over the arrival node would be more normal, hence the asymmetry.

In the symmetric, or correlation, case, the cummulative distribution is centered between the underlying distributions with its tails at the outward tails of the underlying distributions.

The following figure shows roughly what both cumulative distributions would look like.

102 Directed and Undirected Graphs and Distributions

The link in the Markov chain is conditional. The cumulative distribution would occur only when the Markov transition happens, so the valuation would oscillate from the blue distribution on the right to the gray cumulative distribution below it. Those oscillations would be black swans or inverse black swans. The swans appear as arrows in the following figures. Different portions of the cumulative distribution with their particular swans or inverse swans are separated by vertical purple lines.

103 Directed Graph Distributions and Their Swans

The conditional nature of the arrival of an event means that the cumulative distribution is short lived. A flood happens causing losses. Insurance companies cover some of those losses. Other losses linger. The arrival event separates into several different signals or distributions.

Brand also asserts that the cumulative  distribution is static. For that summative distribution to be static, the graph would have to be static. Surprise! A graph of any real world system is anything, but static.

A single conditional probability could drag a large subgraph into the cumulative distribution reducing the height of the cumulative distribution and widening it greatly.

104 Subgraphs

In the figure, two subgraphs are combined by a short-lived Markovian transition giving rise to a cumulative distribution represented by the brown surface. Most of the mass accumulates under the arrival subgraph.

Our takeaways here are that as product managers we need to consider larger graphs when looking for improbable events and effects. Graphs are not static. Look for Markov transitions that give rise to temporary cumulative distributions. Look for those black swans and inverse black swans. And, last, bits don’t just disappear. Bits in information physics position. Information replaces potential energy, so the mouse trap sits waiting for the arrival of some cheese moving event, other things happen. A distribution envelopes the mouse and then vanishes one Fourier component at a time.

But, forgetting the mouse, commoditization of product features is one of those black swans. This graph/distribution stuff really happens to products and product managers.

A Spatiotemporal View

March 10, 2016

A few days ago I pulled Using Business Statistics by Terry Dickey off the shelf of the local public library thinking it would be a quick review. It’s a short book. But, it took a different road through the subject.

Distance, the metric of a geometry, is the key idea under statistics. Like the idea that the distance of a z-score from the mean is measured in standard deviations. A standard deviation is an interval, a unit measure. Variance is a square, an area. And, area is the gateway to probability.

Using the standard deviation as a unit measure, we can mark off the x-axis beyond the convergences of our distribution, and use that x-axis as the basis of our time series. I’ve used this time series idea under the technology adoption lifecycle (TALC), so looking at our past and our future as fitting under the TALC is typical for me.

That was the idea, so I tried it, but the technology adoption lifecycle is really a system of normal distributions spread out over time. The standard deviations for each of those normal distributions would be different. They would be smaller at first and larger later.  The geometries for each of those normal distributions would be different as well.

Smaller and larger are relative to the underlying geometry and our point of view. In the early phases of the TALC, the geometry is hyperbolic. Now appears to be big, and the future appears to be smaller and smaller, so projections will underestimate the future. Hyperbolic geometries also give us things like a taxicab geometry with it’s trickier metric, which brings with it much risk, and the world lines of Einstein. Things are definitely not linear in hyperbolic geometries.  Across the early phases of the TALC, the Poisson games of the early adopter phase tend to the normal, and the geometry achieves the Euclidean at the mean of the TALC. Moving into the later phase of the TALC the geometry tends to the spherical. Spherical geometries require great circles, but provide many ways to achieve any result, so analyses proliferate–none of them wrong, which makes things less risky.

All of those geometries affect that unit measure on the x-axis.

Discontinuous populations generate multiple populations over the span of the TALC, so the statistic itself changes as well. That is what drives the proliferation of standard deviations. Our customer population is small and our prospect population large. The customer population grows with each sale, with each seat, and with each dollar, and similarly the prospect population shrinks with same. It’s a zero sum game. The population under the TALC is fixed. That population is about the underlying enabling technology, not some hot off the presses implementation of a product or a reproduction. Products change as the adoption of the underlying technology moves across the populations of the TALC.

Big data with it’s machine learning will have to deal with the population discontinuities of reality. For now we will do it by assuming linearity and ignoring much. We already assume linearity and ignore much.

Across the TALC, pragmatism organizes the populations. That organization extends to organizing the customers as people and companies. Using negative and positive distances from the mean, similar to +/- standard deviations from the mean, we can place companies and their practices under the TALC. We could even go so far as to break an organization down to the individual executive and their personal locations on the TALC. Even an early adopter doesn’t hire a company full of early adopters.

Delivering functionality is an early phase phenomena on the negative standard deviation side of the TALC. Design is a late phase phenomena on the positive standard deviation side of the TALC.

B2B early adopter and crossing the chasm is early phase. But, why mention that? Well, I’m tired of hearing them show up on the opposite side of the mean out here on Twitter. The consumer facing SaaS vendor is not crossing the chasm. And, their early adopters are B2C. Confusion ensues. Place gets lost. I should ignore more.

Thanks to Jon Gatrell’s comment on The Gods Must Be Crazy post for pulling me back to this blog. Another recession has intervened in my job search, so I’m still looking, but there’s nothing to find, so there is no reason to focus on that search to the exclusion of writing this blog. Thanks for letting me know that someone still reading. WordPress stats don’t tell us much.


More on Geometry

February 6, 2016

A few days ago, I dropped into B&N for a quick browse through the math section. There wasn’t much new there, so off to the business section. There was a new book about innovation, no I didn’t write down a citation, innovation in the sense of it being a matter of the orthodoxy, aka nothing new in the book. It mentioned the need for collaborations between companies should create more value than the sum of the individual part, aka synergy. A former CEO of ITT settled this synergy thing. He called it a myth.

Tonight, I came across another of Alexander Bogomolny’s (@CutTheKnotMath) tweets. This one showed how a cyclic quadrilateral or two triangles sharing a base would give rise to a line between the opposite vertexes, which in turn gives rise to a point E. See his figure.

I look at the figure and see two organizations, the triangles, sharing bits at the base. Those triangles represent decision trees. The base of such a triangle would represent the outcome of  the current collection of decisions, which I’ve called the NOW line. The past is back towards the root of the decision tree, or the vertex of the triangle opposite the base.

It gave me a tool to apply towards this issue of synergy. To get that synergy, the triangles would position themselves on a base line where the bases of the individual triangles would overlap where they gave rise to those synergistic bits. But, they only overlap for a few bits, not all of them, as in that cyclic quadrilateral. I built some models in GeoGebra. I found the models surprising. I’m not a sophisticated user yet, so there are too many hidden lines.

I was asking those geometry questions that I mentioned a few posts about where I drew many figures about what circumstances give rise to non-Euclidean geometries. So as I played with my GeoGebra models, I was always asking where the diameter was, and that was not something GeoGebra does at the click of a button. It does let you draw a circumcircular sector, which looks like a pie with a slice removed, and draw a midpoint of the line opposite a given vertex. That was enough to give me a simple way of seeing the underlying geometry of a triangle. When half the pie is removed, a line between the two points on the circumference is the diameter of the circle, so the triangle is Euclidean. I may have said that a triangle is always Euclidean in earlier posts, but I can see how that I wrong. To be Euclidean, the base of the triangle has to be on a diameter of the circle. A figure will clear this up.

I discussed my hypothesis in the previous post.

Three Geometries - Intuitive

The hypothesis was messy. I had triangles down as being locally Euclidean and globally possibly otherwise.

With the circumcircular sectors, the complications go away.

Three Geometries via Circumcircular Sectors

The new model is so much simpler.

I went on to look at two triangles that were not competitors. I looked at that synergy.

Potential Synergy

The red line represents the shared bits. The yellow shows the potential synergy. The gains from synergy, like the gains from M&As, shows up in the analysis, but rarely in the actuals.

I went on playing with this. I was amazed how decisions far away could have population effects, and functionality effects. This even if they don’t compete on the same vectors of differentiation. But these effects are economic once the other organization is outside your category (macro). We only compete within a category (micro).

Population Effects

In this figure, the populations overlap in the outliers. The triangles don’t overlap. They are not direct competitors. The do not share a vector of differentiation. Point A is not on line DE.

The circles represent their populations. The relative population scale is way off in this figure. The late firm should have a population as much as 10 times larger than the early firm.

The problem with modeling two firms, placing them in space relative to each other, means doing a lot work on coordinate systems or using me tensors. I started drawing a grid. I’ll get that done and look for more things to discover in this model. Enjoy!


Stuff Left Out

January 17, 2016

My review of math has me learning things that I was never taught before. There were concepts left for later, if later ever arrived.

We had functions. At times we had no solutions. Later, they become functions without the right number of roots in the reals, as some of the solutions were in the space of a left out concept, complex roots. For a while, after a deeper understanding of complex roots, I’d go looking for the unit circle, but what was that unit circle centered on? Just this week, I found it centered on c, a variable in those equations like ax2+bx+c. So that open question comes to a close, but is that enough of an answer? I don’t know yet.

We had trigonometry, because we had trigonometric functions. We have the unit circle and an equation, notice that it’s not a function, because as a function it would fail the vertical line test. When a function violates the vertical line test, we create a collection of functions covering the same values in a way that does not violate the vertical line test. Trigonometry does that for us without bothering to explain the left out concept, manifolds.

One of Alexander Bogomolny’s (@CutTheKnotMath) tweets linked to a post about a theorem from high school geometry class about how a triangle with it’s base as a diameter of a circle and a vertex on the circle had angles that added up to 180 degrees. Yes, I remembered that. I hadn’t thought about it in decades. But, there was something left out. That only happens in a Euclidean geometry.

Well, I doodled in my graphics tools. I came to hypothesize about where the vertex would be if it were inside the circle, but not on it, and the base of the triangle was on the diameter. It would be in a hyperbolic geometry was my quick answer; outside the circle, same base, spherical. Those being intuitive ideas. A trapezoid can be formed in the spherical case using the intersections of the circle with the triangle. That leave us with the angles adding up to more than 180 degrees.

I continued to doodle. I ended up with the bases being above or below the diameter, and the vertex on the circle. Above the diameter, the geometry was hyperbolic; below, spherical. I got to a point using arc length were I could draw proper hyperbolic triangles. With the spherical geometries there were manifolds.

The point of all this concern about geometries involves how linear analyses break down in non-Euclidean spaces. The geometry of the Poisson games that describe the bowling allies of the technology adoption lifecycle (TALC) is hyperbolic. The verticals and the horizontals of the TALC approach the normal. Those normals at say six sigma give us a Euclidean geometry. Beyond six sigma, we get a spherical geometry, and with it a redundancy in our modeling–twelve models all of them correct, pick one.

We’ve looked at circles for decades without anyone saying we were constrained to the Euclidean. We’ve looked at circles without seeing that we were changing our geometries without thinking about it. We left out our geometries, geometries that inform us as to our non-linarites that put us at risk (hyperbolic) or give us too much confidence (spherical). We also left out the transitions between geometries.

The circle also illustrates why asking customers about their use cases and cognitive models, or even observing, doesn’t get us far enough into what stuff was left out. Back in school, it was the stuff that was left out that made for plateaus, stalls, stucks, and brick walls that impeded us until someone dropped a hint that led to the insight that dissolved the impedances and brought us to a higher level of performance, or a more complete cognitive model.


January 8, 2016

I just finished reading a review of “Are we Postcritical?,” a book about literary criticism.  It mentions things like Marxism and Structuralism used as the basis of literary criticism. It goes further to mention a stack of critical frameworks.

These critical frameworks are applied to all art forms from literature to sculpture to architecture to “design” in its various outlets. You might find a Structuralist critique of the iPhone. Or, a Marxist critique of some computer game.

But, developers hardly go looking for the critical reviews of their own art. We hear it on twitter. Designers tell us the UI’s of the 80’s and 90’s suck. Well, sure, had we been trying to serve the same users we serve today. I’ve got news for you, they didn’t suck. They served the geeks they were intended to serve. Nothing today serves geeks. Sorry geeks, but we serve consumers today. So today, we can think about that same stack of critical frameworks that gets applied to “design.”

To get to this “design” as a form of art, we would have to be artists. Yes, some Agilists come across as artists in their demand to be free from management and economic considerations and purposes. But, back in the early phase of the technology adoption lifecycle this wasn’t so. Under the software engineering approach, we had a phase called “design!” Yes, we applied critical frameworks that had nothing to do with literary criticism that were just as critical as these art and artist-oriented critical frameworks.

Every domain has its own critical frameworks. Before we drop a bomb, we take a photo. We drop a bomb, then we send another drone to take a photo. That photo is graded via a critical framework, and bombing is assessed. We could if we like apply a structuralist framework during this assessment, but mostly we’ll ask “Hey do we need to send another drone to take the target out?” “No, it’s a kill. Tell PR there was zero collateral damage.” That being just an example of a geek’s critical framework.

More to the point we might say O(n), which becomes possible once we have true parallelism. Yes, everything will have to be rewritten to pass the critical framework of algorithm design. Oh, that word again, “design.”

Accountants have critical frameworks. Likewise, hardware engineers and software developers. They just don’t happen to be artist-oriented frameworks. It just kills me when an artist-designer puts down the design critiqued by some unknown to them critical framework. And, it kills me that these designers want to be the CEO of firms selling to consumers without recognizing all the other critical frameworks that a CEO has to be responsive to like those of the SEC, FCC, and various attorney generals.

Design is the demarcation of and implementation of a balance between enablers and constraints be that with pastels and ink; plaster and pigments, or COBOL. The artist wants to send a message. The developer wants to ship a use case. What’s the difference? At some scale, none at all. Design, in it’s artistic/literary senses has yet to reach the literate/artistic code of developers. Design, in that art sense, is an adoption problem. But, we do have computers doing literary criticism, so one day computers will do code criticism. Have fun with that.

And, yes, since this is a product management blog, some day we’ll have a critical framework for product management. We already do, but who in the general public will be reading that in their Sunday papers? Maybe we’re supposed to be reading about it in the sports section. The sportscaster calling out the use of, oh, that design pattern.

Yes, artist-oriented design is important, as is brand. Have fun there, but there is a place, and that place is in the late phases of the technology adoption lifecycle, not the phases that birthed the category and value chains that underlay all that we do today. Late is not bad, it’s just limited to the considerations of cash, not wealth–oh, an economic critical framework.






December 24, 2015

In this blog, I’ve wondered why discontinuous innovation is abandoned by orthodox financial analysis. Some of that behavior is due to economies of scale. Discontinuous innovation creates it’s own category on it’s own market populations. Discontinuous innovation doesn’t work when forced into an existing population by economies of scale, or economies of an already serviced population.

But, there is more beyond that. In “The Inventor’s Dilemma,” Christensen proposed separation as the means to overcome those economy of scale problem. Accountants asserted that it cost too much, so in the end his best idea did not get adopted. Did he use the bowling ally to build his own category and market population? No. Still, his idea was dead on, but not in the familiar way of separation as a spin out.

Further into the math, however, we find other issues like the underlying geometry of the firm evolves from a Dirac function, to a set of  Poisson games, to an approach to the normal, to the normal, and the departure from that normal. The firm’s environment starts in a taxi cab geometry (hyperbolic) where linearity is fragmented, becomes Euclidean where linearity is continuous, and moves on to spherical where multiple linarites, multiple orthodox business analyses, work simultaneously.

With all these nagging questions, we come to question of coordinate systems. Tensors were the answer to observing phenomena in different frames of reference. Tensors make transforms between systems with different coordinate systems simple. Remember that mathematicians always seek simpler. For a quick tutorial on tensors, watch Dan Fleisch explain tensors in “What’s a Tensor.”

In seeking the simpler mathematicians start off hard. In the next video, the presenter talks about some complicated stuff, see “Tensor Calculus 0: Introduction.” Around 48:00/101:38 into the video,  the presenter claims that the difficulties in the examples were caused by the premature selection of the coordinate systems. Cylindrical coordinates involve cylindrical math, and thus cylindrical solutions; polar, similarly; linear, similarly. Tensors simplified all of that. The solutions were analytical, thus far removed from the geometric intuition. Tensors returned us to our geometric intuitions.

The presenter says that when you pick a coordinate system, “… you’re doomed.” “You can’t tell. Am I looking at  a property of the coordinate system or a property of the problem?” The presenter confronts the issue of carried and carrier, or mathematics as media. I’ve blogged about this same problem in terms of software or software as media. What is carried? And, what is the carrier presenting us with the carried?

Recently, there was a tweet linking to a report on UX developer hiring vs infrastructure developer hiring. These days the former is up and the latter is down. Yes, a bias towards stasis, and definitely away from discontinuous innovation in a time when the economy needs the discontinuous more than the continuous. The economy needs some wealth creation, some value chain creation, some new career creation. Continuous innovation does none of that. Continuous innovation captures some cash. But, all we get from Lean and Open and fictional software is continuous innovation, replication, mismanaged outside monetizations and hype, so we lose to globalism and automation.

I can change the world significantly, or serve ads. We choosing to serve ads.

Back to the mathematics.

I’m left wondering about kernels and how they linearize systems of equations. What does a kernel that linearizes a hyperbolic geometry look like? Spherical kernels likewise? We linearize everything regardless of whether it’s linear or not. We’ve selected an outcome before we do the analysis just like going forward with an analysis embedding a particular coordinate system. We’ve assumed.  Sure, we can claim that the mathematics, the kernel makes us insensitive or enable us be insensitive to the particular geometry. We assume without engaging in WIFs.

Kernels like coordinate systems have let us lose our geometric intuition.

There should be a way to do an analysis of discontinuous innovation without the assumptions of linearity, linearizing kernels, a Euclidean geometry, and a time sheared temporality.

Time sheared temporality was readily apparent when we drove Route 66. That tiny building right there was a gas station. Several people waited there for the next model-T to pull in. The building next to it is more modern by decades.

This is the stuff we run into when we talk about design or brand, or use words like early-stage–a mess that misses the point of the technology adoption lifecycle, only the late main street and later stages involve orthodox business practices typical of F2000 firms. That stuff didn’t work in the earlier phases. It doesn’t work when evaluating discontinuous innovation.


Is the underlying technology yet to be adopted? Does it already fit into your firm’s economies of scale? Wonder about those orthodox practices and how they fail your discontinuous efforts?





December 14, 2015

More statistics this week. Again, surprise ensued. I’ll be talking math, but thinking product management. I’ve always thought in terms of my controls and their frequency of use, but when did the data converge? When does it time series on me? Agile and Minimal Viable Product are experiment based. But, how deep is our data?

So while everything I’m going to say here is not new to anyone maybe, you’ll be surprised somewhere along the way.

First, we start with the definition of probability.

01 Probability 01

The stuff between the equal signs is predicate calculus, or mathematical logic, the easy stuff. It’s just shorthand, shorthand that I never used to get greasy with my notes. In college, I wanted to learn it, but the professor didn’t want to teach it. He spent half the semester reviewing propositional calculus, which was the last thing I needed.

Moving on.

01 Probability 02

What surprised me was “conditions or constraints.” That takes me back to formal requirements specification, in the mid to late 80’s, where they used an IF…Then… statements to prove the global context of what program proving could only prove locally. Requirements were questions. Or, Prolog assertions that proved themselves.

Constraints are the stuff we deal with in linear programming, so we get some simultaneous equations underpinning our probabilities.

01 The World

The red stuff is the particular outcome. Anything inside the box is the sum of all outcomes. Just take the space outside the distribution as zero, or ground.

Lately, I got caught on the issue of what is the difference between iteration and recursion. I Googled it. I read a lot of that. I’ve done both. I’ve done recursive Cobol, something my IT-based, aka data processing, professor didn’t like. No, it was structured coding all the way. Sorry, but I was way early with objects at that point. But, back to the difference, no none of it really stuck me as significant.

What I really wanted was some explanation based on the Ito/Markov chain notions of memory. So I’ll try to explain it from that point of view. Lets start with iteration.


02 Iteration Iteration has some static or object variables where it saves the results of the latest iteration. I’m using an index and the typical for loop constructs. There are other ways to loop.

That’s code, but more significantly, is the experiment that we are iterating. The conditions and context of the experiment tell us how much data has to be stored. In iterations, that data is stored, so that it can be shared by all the iterations. Recursion will put this data elsewhere. The iteration generates or eats a sequence of data points. You may want to process those data points, so you have to write them somewhere. The single memory will persist beyond the loop doing the iteration, but it will only show you the latest values.

It can take a long time to iterate to say the next digit in pi. We can quickly forecast some values with some loose accuracy, call it nearly inaccurate, and replace the forecast with accurate values once we obtain those accurate value. Estimators and heuristics do this roughing out, sketching for us. They can be implemented as iterations or recursions. Multiprocessing will push us to recursion.

03 Iteration w Heuristic

Notice that I’ve drawn the heuristic’s arc to and from the same places we used for our initial iterations or cycles. The brown line shows the heuristic unrolled against the original iterations. This hints towards Fourier Analysis with all those waves in the composition appearing here just like the heuristic. That also hints at how a factor analysis could be represented similarly. Some of the loops would be closer together and the indexes would have to be adjusted against a common denominator.

Throughout these figures I’ve drawn a red dot in the center of the state. Petri nets use that notional, but I’m not talking Petri nets here. The red dots were intended to tie the state to the memory. The memory has to do with the processing undertaken within the state, and not the global notions of memory in Markov chains. The memory at any iteration reflects the state of the experiment at that point.


In recursion, the memory is in the stack. Each call has its own memory. That memory is sized by the experiment, and used during the processing in each call. Iteration stops on some specified index, or conditions. Recursion stops calling down the stack based on the invariant and switches to returning up the stack. Processing can happen before the call, before the return, or between the call and the return. Calling and returning are thin operations; processing, thick.

04 Recursion

The individual memories are shown as red vertical lines inside the spiral or tunnel. We start with calls and when we hit the invariant, the blue line, we do the processing and returning. We start at the top of the stack. Each call moves us towards the bottom of the stack, as defined by the invariant. Each return moves us back towards the top of the stack. The graph view shows the location of the invariant. The calling portion of the tunnel is shorter than the processing and returning portion of the tunnel.

Notice that I’m calling the invariant the axis of symmetry. That symmetry would be more apparent for in-order evaluation.  Pre-order evaluation, and post-order evaluation would be asymmetrical, or giving rise to skewed distributions.

Recursion is used in parsers and in processing trees, potentially game trees. In certain situations we are looking for convergences of distributions or sequences.

05 Convergence and Sequence

The black distribution here represents a Poisson distribution. This is the Poisson distribution of the Poisson game typical of the early adopter in the bowling ally of the technology adoption lifecycle. That Poisson distribution tends to the normal over time through a series of normal. The normal differ in the width of their standard deviations. That increase in widths over time is compensated for by lower heights, such that the area of all those normal is one.

We also show that each call or iteration can generate the next number in a sequence. That sequence can be consumed by additional statistical processing.

06 Numeric Convergence

Here, in a more analytic process, we are seeking the convergence points of some function f(n). We can use the standard approach of specifying a bounds for the limit, , or a more set theoretic limit where two successive values are the same, aka cannot be in the same set. Regardless of how that limit is specified, those limits are the points of convergence. Points of convergence give us the bounds of our finite world.

Throughout I’ve used the word tunnel. It could be a spiral, or a screw. Wikipedia has a nice view of two 3D spirals, take a look. I didn’t get that complex here.

07 3D Sphere Spiral


When you experiment, and every click is an experiment in itself, or in aggregate, how long will it take to converge to a normal distribution, or to an analytic value of interest? What data is being captured for later analysis? What constraints and conditions are defining the experiment? How will you know when a given constraint is bent or busted, which in turn breaks the experiment and subsequent analysis?




Box Plots and Beyond

December 7, 2015

Last weekend, I watched some statistics videos. Along with the stuff I know, came some new stuff. I also wrestled with some geometry relative to triangles and hyperbolas.

We’ll look at box plots in this post. They tell us what we know. They can also tell us what we don’t know. Tying box plots back to product management, it gives us a simple tool for saying no to sales. “Dude, your prospect isn’t even an outlier!”

So let’s get on with it.

Box Plots

In the beginning, yeah, I came down that particular path, the one starting with the five number summary. Statistics can take any series of numbers and summarize them into the five number summary. The five number summary consists of the minimum, the maximum, the median, the first quartile, and the third quartile.

boxplot 01

Boxplots are also known as box and whisker charts. They also show up as candlestick charts. We usually see them in a vertical orientation, and not a horizontal one.

boxplot 04

Notice the 5th and 95th percentiles appears in the figure on the right, but not the left. Just ignore it and stick with the maximum and minimum, as shown on the left. Notice that outliers appear in the figure on the left, but not on the right. Outliers might be included in the whisker parts of the notation or beyond the reach of the whiskers. I go with the latter. Where the figure on the left says the outliers are more than 3/2’s upper quartile, or less than 3/2’s the lower quartile. Others say 1.5 * those quartiles. Notice that there are other data points beyond the outliers. We omit or ignore them.

The real point here is that the customer we talk about listening to is somewhere in this notation. Even when we are stepping over to an adjacent step on the pragmatism scale, we don’t do it by stepping outside our outliers. We do it by defining another population and constructing a box and whiskers plot for that population. When sales, through the randomizing processes they use brings us a demand for functionality beyond the outliers of our notations, just say no.

We really can’t work in the blur we call talking to the customer. Which customer? Are they really prospects, aka the potentially new customer, or the customer, as in the retained customer? Are they historic customers, or customers in the present technology adoption lifecycle phase? Are they on the current pragmatism step or the ones a few steps ahead or behind? Do you have a box and whisker chart for each of those populations, like the one below?


This chart ignores the whiskers. The color code doesn’t help. Ignore that. Each stick represents a nominal distribution in a collective normal distribution. Each group would be a population. Here the sticks are arbitrary, but could be laid left to right in order of their pragmatism step. Each such step would have its own content marketing relative to referral bases. Each step would also have its own long tail for functionality use frequencies.

Now, we’ll take one more look at the box plot.


Here the outliers are shown going out to +/- 1.5 IRQs beyond the Q1 and Q3 quartiles. The IRQ includes the quartiles between Q1 and Q3. It’s all about distances.

The diagram also shows Q2 as the median and correlates Q2 with the mean of a standard distribution. Be warned here that the median may not be the mean and when it isn’t, the real distribution would be skewed and non-normal. Going further, keep in mind that a box plot is about a series of numbers. They could be z-scores, or not. Any collection of data, any series of data has a median, a minimum, a maximum, and quartiles. Taking the mean and the standard deviation takes more work. Don’t just assume the distribution is normal or fits under a standard normal.

Notice that I added the terms upper and lower fence to the figure, as that is another way of referring to the whiskers.

The terminology and notation may vary, but in the mathematics sense, you have a sandwich. The answer is between the bread, aka the outliers.

The Normal Probability Plot

A long while back, I picked up a book on data analysis. The first thing it talked about was how to know if your data was normal. I was shocked. We were not taught to check this before computing a mean and a standard distribution. We just did it. We assumed our data fit the normal distribution. We assumed our data was normal.

It turns out that it’s hard to see if the data is normal. It’s hard to see on a histogram. It’s hard to see even when you overlay a standard normal on that histogram. You can see it on a box and whiskers plot. But, it’s easier to see with a normal probability plot. If the data once ordered forms a straight line on a plot, it’s normal.

The following figure shows various representations of some data that is not normal.

Not Normal on Histogram Boxplot Nomal Probability Plot

Below are some more graphs showing the data to be normal on normal probability plots.

Normal Data

And, below are some graphs showing the data to not be normal on normal probability plots.

Non-normal Data

Going back to first normal probability plot, we can use it to explore what it is telling us about the distribution.

Normal Probability Plot w Normal w Tails

Here I drew horizontal lines where the plotted line became non-normal, aka where the tails occur. Then, I drew a  horizontal line representing the mean of the data points excluding the outliers. Once I exclude the tails, I’ve called the bulk of the graph, the normal portion, the normal component. I represent the normal component with a normal distribution centered on the mean. I’ve labeled the base axis of the normal as x0

Then, I went on to draw vertical lines at the tails and the outermost outliers. I also drew horizontal lines from the outermost outliers so I could see the points of convergence of the normal with the x-axis, x0. I drew horizontal lines at the extreme outliers. At those points of convergence I put black swans of the lengths equal to the heights or thicknesses of the tails giving me x1 and x2.

Here I am using the notion that black swans account for heavy tails. The distribution representing the normal component is not affected by the black swans. Some other precursor distributions were affected, instead. See Fluctuating Tails I and Fluctuating Tails II for more on black swans.

In the original sense, black swans create thick tails when some risk causes future valuations to fall. Rather than thinking about money here I’m thinking about bits, decisions, choices, functionality, knowledge–the things financial markets are said to price. Black swans cause the points of convergence of the normal to contract towards the y-axis. You won’t see this convergence unless you move the x-axis, so that it is coincident with the distribution at the black swan. A black swan moves the x-axis.

Black swans typically chop off tails. In a sense it removes information. When we build a system, we add information. As used here, I’m using black swans to represent the adding of information. Here the black swan adds tail.

Back to the diagram.

After all that, I put the tails in with a Bezier tool. I did not go and generate all those distributions with my blunt tools. The tails give us some notion of what data we would have to collect to get a two-tailed normal distribution. Later, I realized that if I added all that tail data, I would have a wider distribution and consequently a shorter distribution. Remember that the area under a normal is always equal to 1. The thick blue line illustrates such a distribution that would be inclusive of two tails on x1. The mean could also be different.

One last thing, the fact that the distribution for the normal probability plot I used was said to be a symmetric distribution with thick tails. I did not discover this. I read it. I did test symmetry by extending the  x1 and xaxes. The closer together they are the more symmetric the normal distribution would be. It’s good to know what you’re looking at. See the source for the underlying figure and more discussion at


Always check the normalcy of your data with a normal probability plot. Tails hint at what was omitted during data collection. Box plots help us keep the product in the sandwich.



Get every new post delivered to your Inbox.

Join 1,888 other followers