Why enough data will never trump a script –
What happened the last time you asked your parents for a Mercedes convertible?
“In the words of St. Amen, go to hell kid.” Enough said.
“The store is closed right now, but maybe tomorrow.” Sarcasm detector required.
“Why?”
“Because all my friends have one and it’s making me look bad!”
“Sure, we’ll go in the weekend.” Money alert.
Ask yourself, rich or poor, somewhere inside us, don’t we all agree that a Mercedes convertible is a big request? Ridiculous, unreasonable, childish…may be other descriptions come to mind depending on who you are, but the general consensus is that it is a big request.
Similarly, there are other commonly known big requests. Getting a fancy makeover, buying expensive clothes, asking for too much at a restaurant.
“Waiter, can I have everything on the menu? Maybe, a barf bag to go with it and a sweat towel?”
Yeah, not gonna happen…
Now remember, scripts don’t account for exceptions. That guy could be a dear, dear friend. So, things might be different for him, but we don’t see that a lot.
The script for a big request is likely to involve resistance; for most of us, a flat-out NO. Asking for too much at a restaurant and buying a Mercedes convertible are substantively unrelated requests, yet the human mind recognizes a theme and similar probable outcomes.
Could enough data cover contextual information? Is a script really necessary? If we gathered all the data, would that be enough to cover relationships, patterns, context?
Let’s do Mercedes convertible and the fancy makeover examples. The broader context here is that they are both big requests. Now, let’s think about data fields relating to this contextual theme in the case of both these requests, individually.
Mercedes Convertible possible data fields: Average response, price, No of people who own one, no of people who do not own one.
Fancy makeover possible data fields: Average response, people who get them, people who don’t get them, price.
Now, think about these data fields. For a process to tie these fields across the two examples and abstract the theme that they are both big requests, what kind of analysis would it require?
Statistically, it could analyze a random group of people. For statistical validity the sample has to satisfy certain conditions, which basically means that you can’t talk to a 100 people from a small town in Indiana or, a big city in California. Though 100 is a decent size, the sample is not heterogenous enough.
Let’s say a sample drew randomly from all over the world. If it included more rich people than poor people for the Mercedes Convertible example, and less rich people than poor people for the fancy makeover example, the analysis process wouldn’t know how to relate the results. You could do a couple of rehashes to ensure maximum data integrity, but it’s still very hit and miss.
You could literally call the whole world your sample, which is over 5 billion people. Analyzing such a large sample would give you the best and most credible results, but it defeats the point of statistics, which exists to process smart and not large.
Another issue is nonsensical data fields. For example, if you were trying to find out if a request is big by comparing and analyzing across data for a set of examples, you could end up drawing correlations that show in the data, but make no operational sense.
For example, if receiving no in response to your request is a key criteria and the data tells you that this happens every time a storm hits Uranus, the analysis would conclude that the likelihood of a big request is correlated to weather patterns on Uranus.
“Wait, what? That’s ridiculous!”
“Numbers don’t lie, bro”
“Yeah, but the numbers are stupid”
*Awkward impasse*
People don’t realize the full versatility of human intelligence. Data, no matter how comprehensive, cannot undermine the necessity of scripts. Not only is generalizing hard, but it can lead to downright nonsensical conclusions.