All models are wrong, but some are completely wrong

COVID -19, Coronavirus Infection inside human body. Respiratory disease is spreading. Chinese epidemic, infected cells under microscope. 3d illustration. Development, research of a vaccine

At this critical time in the modern history of the human race, mathematical models have been pushed into the foreground. Epidemic forecasting informs policy and even individual decision-making. Sadly, scientists and journalists have completely failed to communicate how these models work.

Last week the Financial Times published the headline ‘Coronavirus may have infected half of UK population’, reporting on a new mathematical model of COVID-19 epidemic progression. The model produced radically different results when the researchers changed the value of a parameter named ρ – the rate of severe disease amongst the infected. The FT chose to run with an inflammatory headline, assuming an extreme value of ρ that most researchers consider highly implausible.

Since its publication, hundreds of scientists have attacked the work, forcing the original authors to state publicly that they were not trying to make a forecast at all. But the damage had already been done: many other media organisations, such as the BBC, had already broadcast the headline [1].

Epidemiologists are making the same mistakes that the climate science community made a decade ago. A series of crises forced climatologists to learn painful lessons on how (not) to communicate with policy-makers and the public.

In 2010 the 4th IPCC report was attacked for containing a single error – a claim that the Himalayan glaciers would likely have completely melted by 2035 (‘Glacier Gate’). Climate denialists and recalcitrant nations such as Russia and Saudi Arabia seized on this error as a way to discredit the entire 3000 page report, which was otherwise irreproachable.

When the emails of the Climatic Research Unit (CRU) of University of East Anglia were hacked in 2009, doubt arose over the trustworthiness of the entire climate science community. Trust was diminished because the head of the CRU refused to openly share computer code and data. The crisis was to cast a pall over the climate science community for many years.

By the time of the 5th IPPC report, mechanisms had been developed to enforce clear communication about the uncertainty surrounding predictive models; and transparency about models and data. The infectious disease community needs to learn these lessons. And learn them quickly.

Over the last days, several infectious disease non-experts have gained media coverage for various ‘too good to be true’ (and plain wrong) coronavirus forecasts. Ideologically-driven commentators have used these results to justify easing of social distancing rules, with potentially devastating consequences.

Scientists and journalists have a moral responsibility to convey the uncertainty inherent in modelling work. There is much at stake. Here we recommend a handful of rules for policy-makers, journalists and scientists.

Rule 1. Scientists and journalists should express the level of uncertainty associated with a forecast

All mathematical models contain uncertainty. This should be explicit – researchers should communicate their own certainty that a result is true. A range of plausible results should be provided, not just one extreme result.

Rule 2. Journalists must get quotes from other experts before publishing

The worst cases of poor COVID-19 journalism have broken this simple rule. Other scientists have weighed in after publication. But by then a misleading article has reached an audience of millions and taken hold in the public consciousness.

Rule 3. Scientists should clearly describe the critical inputs and assumptions of their models 

How sensitive is the model to the input parameters? How sure are you of those parameters? Do other researchers disagree?

Rule 4. Be as transparent as possible

Release data and code so that scientific scrutiny can take place. Consider open peer-review so that other experts can quickly give their opinion on a piece of work.

Rule 5. Policy-makers should use multiple models to inform policy

The Imperial college model created by Neil Ferguson has been reported on almost exclusively as the modelling input to UK pandemic policy. Have other models from other groups been considered? What is the degree of agreement between the models?

Rule 6. Indicate when a model was produced by somebody without a background in infectious diseases 

Would we encourage an epidemiologist to apply ‘fresh thinking’ to the design of an electrical substation? Perhaps we should treat with caution the predictions of electrical engineers about pandemic disease outbreaks.

Martin Goodson (Chair of the RSS Data Science Section)


[1] Post-publication, the FT have modified the report text but have left the headline unchanged.

Thanks to Danielle Belgrave, Piers Stobbs, Lucy Hayes and Adam Davison for helpful comments


Published by dssaisection

Chair of the Royal Statistical Society Data Science Section.

23 thoughts on “All models are wrong, but some are completely wrong

  1. While I like your 6 rules generally, and I certainly didn’t like the FT’s original reporting of that Oxford model, I think it would be better if you’d used the definition of rho that was actually used in their model, which was about susceptibility to severe disease, not the rate of severe disease requiring hospitalization. And it could have been useful to point out that the Oxford researchers are indeed infectious disease modellers, and that they included a sensitivity analysis for rho and some indications of uncertainty. That doesn’t mean they were right to publish in the form they did, and without the code.

    Liked by 1 person

    1. Thank you Kevin – I’ve made the change you suggested. To be fair on the FT, the Oxford team confirmed with them that they were happy with their report. The academics must share some of the responsibility here. Also see commentary here –

      Although the authors may defend their modelling as simply exploring possible scenarios, they nevertheless have left a dangerously misleading statement in their article that was also quoted by the Financial Times:

      “Importantly, the results we present here suggest the ongoing epidemics in the UK and Italy … have already led to the accumulation of significant levels of herd immunity in both countries.”

      Their modelling does not suggest that. It is just one of many possibilities, if no other data are considered. Such data exists and tends to contradict the authors’ conclusion.


      1. I use the phrase “herd immunity” as a marker for infectious disease experts not attuned to effective popular communication. People do not think of themselves as part of a herd. “Population immunity” is an easy substitution, speaks to popular understanding, and contributes to motivation for well-advised actions.


  2. Pingback: New top story on Hacker News: All models are wrong, but some are completely wrong – The Pakistani News Corner

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: