N-Grams and Strategy

In my recent travels I’ve been flying around a lot. I’ve not managed to use my laptop. But, I did get to read a few books. The math book I was dragging around was one I had to take notes on while I read it. So I stopped in at the Hudson’s Books shops in almost every airport it seems and picked up a quick airplane read. My first such book was “Uncharted: Big Data as a Lens on Human Culture” by Erez Aiden and Jean-Baptiste Michel, http://www.amazon.com/Uncharted-Data-Lens-Human-Culture/dp/1594632901/ref=sr_1_2?s=books&ie=UTF8&qid=1418721581&sr=1-2&keywords=uncharted+in+books. This is the first book I’ve read on n-grams. It was the charts that drew me to the book.

I’ve written about charting use frequency and how this chart is our long tail and more. The frequencies of use I talked about was that of features and content in our content marketing and support universes. In “Uncharted,” the authors are talking about word frequencies taken from every book published going back as far as possible. They used books, because books are stable purveyors of history.

In their first chart they looked at the United States as a singular noun and as a plural noun. These uses changed over historic time. Singular replaced plural usage in 1880. OK, so what? Well, if you managed a piece of software back then, eventually, you were going to have to edit your UI and the content of your content marketing. Worse, you might have to consider changing your concept model, data structures, existing features, and add some new functionality. The lexical network reflects the semantic network much of this in code. And, a side effect of changes to this network, you might consider how cognitive limits shape such networks and architectures. Worse, these lexical changes can escape code and become organizational issues.

Consider the fuzzy concepts of wants and needs. Marketers mess with this fuzziness often enough. It’s as bad as sitting though a product camp talk about value. Most of which is correct, but barely, and usually much too close to the interface limiting the value we deliver to customers. But, back to wants and needs, words battling it out in lexical space, aka in the n-grams and the charts of such. In “Uncharted,” the author’s mention a study of wants and needs.

Need vs Want 01

This first chart shows the raw n-grams for “I want” and “I need.” In 1800, people needed more than they wanted. In 1862, that changed wants began to outstrip needs. Needs faded into the background of your lives. But this is the kind of thing that happens with feature frequencies all the time. We talk about email dying, but really, we still check our email.

Now taking a step back what we see are two lines intersecting. We see a game theoretic game.

Need vs Want 03

The graph can be expressed as a collection of mixed strategies. The numbers in the ratios reflect the difference between the two competitors. Still, will our organization serve both groups of users? Will we deliver the lower priced needs, or will we go for the higher priced wants? Will we deal with repeat business or will we be a hits-based business. Will we create an organization that does one of these well, both well, one of these less well? Somebody gets to decide. Those decisions end up in our offers, our organizations, and our financial results.

Need vs Want 02

Another view of the same chart would tell us about the undifferentiated infrastructure (tan); the infrastructure for need; the infrastructure for want; growth and decline; convergence, divergence, and steady state, relative investment levels. It can hint at world sizes.  The point labeled zero is where the words meant pretty much the same thing. They were interchangeable. Max is where the maximum difference was achieved. The differences between words lies in their connotations and denotations.

It’s not just about word frequencies. Words are proxies. The authors mention several proxies they used to study things that didn’t have any direct words and n-grams to look at.

Given that the frequencies of use of our own functionality can be explored. Seeing across categories will be harder and legally more complicated. You might not know why your Save function is being used less, but seeing it tells you to go find the reason. Competitive thrusts will show up in your use frequencies.

Those mixed strategy ratios are measures of differentiation. They tell us about the offer, the company, the customer. And, notice that these charts are about counts, rather than statistics and probability distributions. Capturing your server log entries as histories of use frequencies might require some work within your organization, but the clicks are there for the charting.

Comment please.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: