We are all been bombarded with statistics and graphs right now. Both social media and traditional news outlets are showing graphs of infections, changes in death rates, impact on the economy or environment. We are being overwhelmed with data and it’s important that professionals presenting data do so clearly to a wide audience.
The example I have today is something pretty benign, but frustrating to me as a professional. Recently, the BBC did an environmental story on how air pollution had dropped year on year since the lockdown began. Setting aside for one moment that the data linked to by the article is a snapshot averaging nitrogen dioxide over the two week period and therefore cannot directly lead to the first graph in the article, it was the graphs themselves that caught my eye.
This first graph has 2019 in red and 2020 in blue. While it is possible to argue that the 2020 levels were already following a different path to 2019, the very short timescale of the graph and the similarity in shape makes a compelling comparison. A daily breakdown is not available in the source spreadsheet nor is it available directly on the DEFRA website even a week after this article was written. Let’s take a look at the second graph.
This graph is a subset of the results released, which is not mentioned, although for this graph the colours have been swapped: 2019 is now in blue and 2020 in red. While there is a key showing the change, most people reading an article and seeing consecutive visualisations would make the valid assumption that the keys are consistent when related data is presented. This is one of the basic rules of data science communication – don’t confuse your audience!
While, in this instance, I believe this was an honest mistake it is something that can be used to misdirect readers. Please be careful when presenting your data, and as ever, even more careful when reading and interpreting that presented by others.