More statistics this week. Again, surprise ensued. I’ll be talking math, but thinking product management. I’ve always thought in terms of my controls and their frequency of use, but when did the data converge? When does it time series on me? Agile and Minimal Viable Product are experiment based. But, how deep is our data?

So while everything I’m going to say here is not new to anyone maybe, you’ll be surprised somewhere along the way.

First, we start with the definition of probability.

The stuff between the equal signs is predicate calculus, or mathematical logic, the easy stuff. It’s just shorthand, shorthand that I never used to get greasy with my notes. In college, I wanted to learn it, but the professor didn’t want to teach it. He spent half the semester reviewing propositional calculus, which was the last thing I needed.

Moving on.

What surprised me was “conditions or constraints.” That takes me back to formal requirements specification, in the mid to late 80’s, where they used an IF…Then… statements to prove the global context of what program proving could only prove locally. Requirements were questions. Or, Prolog assertions that proved themselves.

Constraints are the stuff we deal with in linear programming, so we get some simultaneous equations underpinning our probabilities.

The red stuff is the particular outcome. Anything inside the box is the sum of all outcomes. Just take the space outside the distribution as zero, or ground.

Lately, I got caught on the issue of what is the difference between iteration and recursion. I Googled it. I read a lot of that. I’ve done both. I’ve done recursive Cobol, something my IT-based, aka data processing, professor didn’t like. No, it was structured coding all the way. Sorry, but I was way early with objects at that point. But, back to the difference, no none of it really stuck me as significant.

What I really wanted was some explanation based on the Ito/Markov chain notions of memory. So I’ll try to explain it from that point of view. Lets start with iteration.

**Iteration**

Iteration has some static or object variables where it saves the results of the latest iteration. I’m using an index and the typical for loop constructs. There are other ways to loop.

That’s code, but more significantly, is the experiment that we are iterating. The conditions and context of the experiment tell us how much data has to be stored. In iterations, that data is stored, so that it can be shared by all the iterations. Recursion will put this data elsewhere. The iteration generates or eats a sequence of data points. You may want to process those data points, so you have to write them somewhere. The single memory will persist beyond the loop doing the iteration, but it will only show you the latest values.

It can take a long time to iterate to say the next digit in pi. We can quickly forecast some values with some loose accuracy, call it nearly inaccurate, and replace the forecast with accurate values once we obtain those accurate value. Estimators and heuristics do this roughing out, sketching for us. They can be implemented as iterations or recursions. Multiprocessing will push us to recursion.

Notice that I’ve drawn the heuristic’s arc to and from the same places we used for our initial iterations or cycles. The brown line shows the heuristic unrolled against the original iterations. This hints towards Fourier Analysis with all those waves in the composition appearing here just like the heuristic. That also hints at how a factor analysis could be represented similarly. Some of the loops would be closer together and the indexes would have to be adjusted against a common denominator.

Throughout these figures I’ve drawn a red dot in the center of the state. Petri nets use that notional, but I’m not talking Petri nets here. The red dots were intended to tie the state to the memory. The memory has to do with the processing undertaken within the state, and not the global notions of memory in Markov chains. The memory at any iteration reflects the state of the experiment at that point.

**Recursion**

In recursion, the memory is in the stack. Each call has its own memory. That memory is sized by the experiment, and used during the processing in each call. Iteration stops on some specified index, or conditions. Recursion stops calling down the stack based on the invariant and switches to returning up the stack. Processing can happen before the call, before the return, or between the call and the return. Calling and returning are thin operations; processing, thick.

The individual memories are shown as red vertical lines inside the spiral or tunnel. We start with calls and when we hit the invariant, the blue line, we do the processing and returning. We start at the top of the stack. Each call moves us towards the bottom of the stack, as defined by the invariant. Each return moves us back towards the top of the stack. The graph view shows the location of the invariant. The calling portion of the tunnel is shorter than the processing and returning portion of the tunnel.

Notice that I’m calling the invariant the axis of symmetry. That symmetry would be more apparent for in-order evaluation. Pre-order evaluation, and post-order evaluation would be asymmetrical, or giving rise to skewed distributions.

Recursion is used in parsers and in processing trees, potentially game trees. In certain situations we are looking for convergences of distributions or sequences.

The black distribution here represents a Poisson distribution. This is the Poisson distribution of the Poisson game typical of the early adopter in the bowling ally of the technology adoption lifecycle. That Poisson distribution tends to the normal over time through a series of normal. The normal differ in the width of their standard deviations. That increase in widths over time is compensated for by lower heights, such that the area of all those normal is one.

We also show that each call or iteration can generate the next number in a sequence. That sequence can be consumed by additional statistical processing.

Here, in a more analytic process, we are seeking the convergence points of some function f(n). We can use the standard approach of specifying a bounds for the limit, , or a more set theoretic limit where two successive values are the same, aka cannot be in the same set. Regardless of how that limit is specified, those limits are the points of convergence. Points of convergence give us the bounds of our finite world.

Throughout I’ve used the word tunnel. It could be a spiral, or a screw. Wikipedia has a nice view of two 3D spirals, take a look. I didn’t get that complex here.

**Onward**

When you experiment, and every click is an experiment in itself, or in aggregate, how long will it take to converge to a normal distribution, or to an analytic value of interest? What data is being captured for later analysis? What constraints and conditions are defining the experiment? How will you know when a given constraint is bent or busted, which in turn breaks the experiment and subsequent analysis?

## Leave a Reply