Gone are the times when information was a privilege of the exclusive insiders and elite few. Data is a resource for the masses in the digital world of today. It has pervaded nearly about every field, so much so that basic statistical techniques now form a core course of almost all university programs. So how does this affect the field of journalism?
Navigating the data flood
Open public data, crowd-sourced data, data maintained in private databases; there are stories to be had everywhere! The skill lies in the crafting: the cleaning, combining and shaping of those vast arrays of digits and characters into a tale worth telling (see Paul Bradshaw’s inverted pyramid description of the data journalism process). This in its essence forms the core of the emerging field of data journalism. A data journalist, much like a film-maker or animator, brings numbers to life and makes the stories they tell relatable to everyone. However, story-telling with numbers is a skill that needs to be honed. The vast inexperience of dealing with numbers often places journalists in two kinds of dangers:
- Not knowing how to find a story
- Presenting a faulty story
The former issue can be addressed by some simple quests, such as looking for outliers, correlations and other patterns. The latter problem of presenting a faulty story however, is a still more vital one, mainly because it poses a threat to the very building blocks of journalism: reliability and veracity. A faulty data analysis can lead to various wrong interpretations, and a serious misrepresentation of the issue at hand.
The quintessential example of a story gone wrong
While searching for an interesting example to explain what can go wrong in the process of data-powered storytelling, I hit the jackpot!
Here’s a story that makes a perfect laboratory specimen for budding data journalists to dissect: Stop forcing people to wear bike helmets. This report is abundant with examples of faulty data analysis. I’ll focus on just three of those numerous blunders to set the scene here.
Claim 1 (sweeping statement based on inconclusive evidence): “If you don’t feel like wearing a helmet while biking, that’s fine”
Venturing beyond the off-handedness of the remark itself (“that’s fine”?), none of the arguments stated in the text actually support this claim. Take for example: “(…) study after study has shown, you’re better off with a helmet if you’re in an accident” (counter-intuitive); or “While they do protect your head during accidents, there’s some evidence that helmets make it more likely you’ll get in an accident in the first place” (faulty reasoning).
Claim 2 (assuming correlation for causality): “The data on whether helmets reduce total accidents is ambiguous”
Why should wearing helmets reduce the number of accidents? It might reduce the chance of a fatality or severe incident when a person in a bike accident suffers from a head injury in the first place. But there is absolutely no reason for it to causally imply a reduction in total accidents.
Claim 3 (unrepresentative sampling): “Drivers seem to be less cautious around helmeted bikers.”
This might have made an interesting argument, had it not been based on the observations of just one researcher on his single 200 mile bike ride.
For more on this particular case, or other similar cases of bad data journalism, see Alberto Cairo’s blog.
The Don’ts of storytelling with numbers
Although deeper knowledge of statistics could have gone a long way in making this particular helmet-opposing journalist aware of the errors in his reporting, a few simple rules of thumb can serve as a vital guide to journalists to avoid common, yet big fallacies in presenting data-based stories. Here they are, in the form of 3 crucial Don’ts:
- Don’t make sweeping claims when confronted with ambiguous information and widely-researched countering arguments.
- Don’t mistake correlation for causality. Just because two variables are related, does not mean that they necessarily influence each other.
- Don’t base a scientific argument on the results observed with an underrepresented sample, particularly if that sample accounts for only one person!
Finally, as Uncle Ben famously told good-old Spidey, “With great power comes great responsibility!”
Numbers exert a certain power; that of reliability and revealing implicit stories beyond the limits of immediate vision and comprehension. However, storytelling with numbers comes with its own set of responsibilities. We should be aware that a story based on a faulty data analysis can lead to numerous misinterpretations and hasty conclusions. Worse still, with phrasing that’s catchy enough, the story might be shared widely, thus spreading the misinformation still further. Interestingly, the article cited above has so far been shared 15,000 times of Facebook and 1,915 on Twitter, and is possibly doing its fair share of brain damage around the world.