“Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” – Aaron Levenstein
“Do people lie with graphs”: The phrase fetched me 32,900,000 results on Google in 0.43 seconds, today. Yes, I am aware, as you should be, that these numbers do not prove a thing. We can however, safely place a guess that ‘lying’ is a word that comes up frequently in connection with data visualizations, such as graphs. But then again, this would be based on my assumption that 32,900,000 results is a lot.
See where this is leading to?
All statistics have a context, one that is dynamic and ever-changing, and oftentimes subjective. Attempting to create a data visualization is like taking a snapshot that captures one instance of a moving image, in a bigger world. It is no surprise then that data visualizations can often lead to faulty claims or misinterpretations, either intentionally or unintentionally. Even simple choices in visualizations, such as the axes or time period can affect the particular story that is drawn from it.
Subjectivity in visualizations
“The information the visualization reveals shapes the perception of the reader.” – Alberto Cairo
Following the much-talked about gang rape incident in Delhi on 16 December 2012, there was a spate of articles on the high rates of rapes cases in India. One such story was carried by the New York Times which went ahead and made some severe, but in my opinion, relevant claims that “India must work on changing a culture where women are routinely devalued.” (Rape in the World’s Largest Democracy – The New York Times Editorial, 28 December 2012)
When viewed in isolation, it is indeed the case that the number of reported rape cases in India have been increasing every year. But then place India in context of the rest of the world. In his blog post Lies, Damned Lies, Rape and Statistics, Sharad Goel compares the number of police-recorded rape cases across various other countries of the world, and shows how India features at the lower end of the figure, particularly in comparison to rape cases in the United States, a subject that the NY Times did not bring into the picture. Notice also that the number of rape cases are reported by Goel as a proportion per 100,000 people. This is yet another vital choice that might have led to a different interpretation had the figures been absolute, for the population of India by far exceeds most of the other countries that score above it on the graph.
Does this tell a different story than the one presented by NY Times? Yes it does. But so does this fact: Compared to earlier when rape incidents in India were kept secret to keep the family reputation intact, more and more Indian women are now officially and openly reporting rape. Sharad Goel’s graph illustrates the number of police-recorded cases. However, many incidents of rape in India go unreported, and many are dismissed by the police themselves amidst the prevalent bureaucracy and corruption. This would mean that the actual number of rape cases in India might be severely underreported in Goel’s graph, as he himself responsibly mentions.
Avoiding the trap of deceptive perspectives
Well, the question that follows then is: can we avoid the trap of possible deceptions that data and visualizations can pose when viewed from just a single perspective? I believe we can. However, I also know that it takes practice, experience, and conscious processing and questioning of what we see in the visual in front of us. Below are a list of some important elements to look at when presented with or presenting a data visualization. To make them simpler to remember, I’ve collated them as the ABCDE of data visualizations:
Axes: Check what measures the graph represents and whether they make sense in the given context. It is also important to check at what point the numbers start and in which direction they run. In the example above, the axes representing only police-reported rape cases instead of the actual total of rape cases in a country, could lead to misinterpretations
Baseline: Baselines determine the point of comparison. They can be a zero point, a previous time, an ideal state or a predicted state. In other words, they vary across story perspectives. In the above example, if the rape cases were measured for India alone, the comparison would be made between subsequent years. In the multi-country graph however, comparisons are made across the countries.
Change rates: Check whether the measures are represented in absolute values or as ratios and percentages. This can influence the way you interpret the data. As explained above, the total number of rape cases as opposed to the number of rape cases per 100,000 population, would have led to different interpretations due to differing population sizes between countries.
Dates: The time series over which the data is gathered forms an important part of the story. Some contexts like stock markets may show interesting trends across single weeks and dates, whereas other contexts like population growth operate over longer time periods. The graph above stops at 2009, which is already 5 years behind in time.
Excluded information: Look out for any related information that the graph doesn’t cover but that could influence the interpretation. Once again, in the case above, the fact that a lot of rape cases go unreported in India, might be a vital element to keep in mind while drawing inferences.
In pursuit of objectivity
In as far as journalistic stories stem from individual perspectives, I’m of the opinion that an inarguable objectivity remains unachievable. It has been eluding us for decades and will continue to do so. Data however, being irrefutable, are often considered uniquely representative of the truth. The above example illustrates that even when working with data and visualizations, the story that journalists decide to tell and visualize will be coloured to some extent by their individual perspectives, the data that they have access to, and the elements they might have overlooked.
Although the ABCs above can be a vital guideline in avoiding deceptive interpretations, they are by no means fool-proof, or even comprehensive for that matter. Each story comes with its own context, and data and visualizations should be interpreted on a case-by-case basis. For the rest, I think that all that what we as journalists can do is:
(a) Openly present our limitations and admit to our errors should they occur (ii.e., practice transparency)
(b) Follow the norms of responsible reporting, and present arguments with good reason and evidence