As Nate Silver explained in What the Fox Knows, his blog post manifesto about data journalism, “the plural of anecdote is data,” the aphorism of political scientist Ray Wolfinger, makes sense. “Data does not have a virgin birth,” Silver wrote. “It comes to us from somewhere. Someone set up a procedure to collect and record it.” Silver broke down the process by which anecdote becomes data into four steps: collection, organization, explanation, and generalization.
Collection and organization are the essential first two steps, and comparatively easier than the next two steps. However, this doesn’t mean collection and organization are always done well. As Silver noted, “the most problematic news stories are often those that leap ahead in the process, drawing grand conclusions from thin evidence.”
“Most of us learn by metaphors and stories,” Silver explained, “so traditional journalism’s method of organizing information into stories has a lot of appeal when news happens. Still, there are some handicaps that conventional journalism faces when it seeks to move beyond reporting on the news to explaining it.” Explanation is fraught with the perils of subjectivity and the human mind’s strong desire for establishing causality even when weak correlations are discovered in data.
Generalization is the hardest step of all. “No matter how well you understand a discrete event,” Silver explained, “it can be difficult to tell how much of it was unique to the circumstances, and how many of its lessons are generalizable into principles. Generalization is a fundamental concern of science, and it’s achieved by verifying hypotheses through predictions or repeated experiments. Predictions in the sciences (especially the social sciences) are often fairly poor. They usually get better after repeated trials and iterations. But they require a lot of work.”
“There’s a trade-off between vividness and scalability,” Silver explained, “Narrative accounts of individual news events can be informative and pleasurable to read, and they can have a lot of intrinsic value whether or not they reveal some larger truth. But it can be extraordinarily hard to make generalizations about news events unless you stop to classify their most essential details according to some numbering or ordering system, turning anecdote into data.”
Turning Data into Doctrine
Silver’s description of the challenge of data journalism, how to make a news story vivid and accessible to a broad audience without sacrificing rigor and accuracy, is also the fundamental challenge of all categories of data analysis and reporting, including those used to drive strategic, tactical, and operational business decisions. Here the difficult step of generalization is about analyzing the data to report whether it was unique to the circumstances or its lessons can be generalized into guiding principles, which are then formalized into policies and procedures. Which means that after turning anecdote into data, organizations are turning data into doctrine. (And governments are turning data into laws and regulations, but that’s for another blog post.)
This is why outliers can be the demons or the darlings of data. Sometimes an outlier is insight into a trend, other times it is an indication of a data quality or process quality issue. “Sometimes it can be extraordinarily valuable to explore an outlier in some detail,” Silver explained. The goal of this outlier analysis, however, “should be to explain why the outlier is an outlier, rather than indicating some broader trend.” Unfortunately, the latter is done more often, turning the exceptions into the rules to be followed. How many corporate policies, for example, have been enacted in reactionary fear over a single isolated incident (i.e., an outlier)?
One technique used by criminal psychologists to determine if someone is telling a true story is to ask them to tell it in reverse order. This is because while it’s hard to tell any story backwards, a lot of contradictory details surface when even a well-rehearsed lie is retraced. As long and winding as the road from anecdote to data to doctrine is, retracing the steps from doctrine to data to anecdote has so many twists and turns that often it’s difficult, if not impossible, to find the way back.
Just like a journey of a thousand miles begins with a single step, a journey of a thousand data points begins with a single anecdote. The breadcrumb bits of data that mark the trail home can be hard to find. Especially after people take a byte here and a byte there, leaving hindsight bias to fill in the blanks and create a more compelling story. In some cases this results in telling an impenetrable lie, backwards and forwards.
As Silver said, data comes to us from somewhere since someone set up a procedure to collect and record it. While knowing where, who, and how are important questions, it’s also important to remember that the answers are often more fiction than fact.