The Coronavirus Outbreak Didn’t Start In August

The tale of conspiracies and some problematic science

Image for post
Image for post
Pictured: More recent than August Source: Pexels

Note: this blog has been updated after an email from the authors clarifying some points. The study did not use the national search trends from Baidu, it used the searches from Wuhan only. Similarly, they did not standardize by time period, but since most of the photos were within a 3-hour time stamp they felt that this was unnecessary. The study is being amended to reflect these points.

One question that’s been ongoing throughout the entire coronavirus pandemic has been very simple but somewhat difficult to answer — how did this all start? The blame has largely fallen on the Wuhan live food market, but increasingly people are asking whether the outbreak might have started somewhere else, perhaps even earlier than reported.

Image for post
Image for post
Pictured: hard to pin down Source: Pexels

And this week stories have emerged from media around the world saying that not only did the virus start earlier than we previously thought, it may have been around as early as August 2019. According to Harvard academics, the reports say, something was happening in China months before the official reports recognized COVID-19 as an issue. And not just because of some vague conspiracies — this is, apparently, science.

Unfortunately, while it makes a great story — and plays into delightful conspiracies — it is also almost certainly complete nonsense.

Coronavirus probably wasn’t spreading around China as early as August, no matter what the headlines have said.

The study that has everyone excitedly discussing whether coronavirus emerged months before we think it did was an interesting piece of work. A group of academics took satellite photos of 5 hospitals in Wuhan, the city in China where the COVID-19 outbreak first happened, at 111 time points from 2018 to 2020. They manually recorded how many cars were parked in the parking lots of each hospital, and mapped the trend of the car volumes over time. They also looked at search trends on Baidu, a Chinese website, for “cough” and “diarrhea”, because these are two symptoms that have been associated with COVID-19.

Image for post
Image for post
The problem with fancy suits and diarrhea symptoms is, uh, ease of removal in a time of crisis Source: Pexels

They found that there appeared to be an increased volume of cars parking in the parking lots of these hospitals, as well as a modest increase in the national search trend for cough and diarrhea, earlier than the reported COVID-19 dates in China. This, they argued, was evidence that actually coronavirus was spreading much earlier than initially thought, potentially as early as April 2019.

Except that is, broadly speaking, total rubbish.

Firstly, the diarrhea point. While the authors argue that it is a primary symptom of COVID-19, the paper that they reference to support this actually only found that 17% of patients had diarrhea, in a very small sample. If you look at the best evidence on the topic, it seems that somewhere around 5% of people with COVID-19 get diarrhea, which makes the idea that you can definitively establish a connection to the disease with search terms for the symptom a bit problematic.

But that also raises the question of the search terms themselves. The study reportedly used search terms that were not consistent with the commonly used Chinese — “symptoms of diarrhea” instead of “diarrhea”. According to a lengthy thread on twitter, when you input the correct terms, you actually see the opposite trend to the one reported in the study. There’s also an issue in that Wuhan saw an increase in the rate of searches for “symptoms of diarrhea”, but so did the rest of China. This makes it very unlikely that these search terms reflect an outbreak in Wuhan specifically, unless the authors are arguing that the entirety of China was seeing infections in August 2019.

Image for post
Image for post
People sometimes forget this, but China is BIG Source: Pexels

There were issues with the parking lot data as well. As people have pointed out on twitter, there are many reasons that you might see more cars parking at some cherry-picked locations in Wuhan. Hospitals build new car parks all the time, they renovate and grow, and they remove tree and building cover that might be blocking satellite photos. At the absolute minimum, you’d need to control for the capacity of the hospitals themselves before making any sort of conclusion about the cars parking around them, which the study failed to do.

It gets worse. The authors reported standardizing by day — i.e. comparing a Sunday in 2019 to one in 2018 — but not by time. It’s entirely possible that the differences in the photos simply represent the difference between the number of people attending hospitals at different times of day. While the authors have since clarified that the photos were all time-stamped between 11am and 2pm, temporal variation is not unlikely given the small number of pictures (remember, just 111 days over two entire years).

There were other issues here. The study validated their data using information from two sentinel sites in Wuhan, but the data from these hospitals was very minimal. There was no comparison to other places in China, to see if this might have been a national trend of some kind, and no other way of verifying the information.

And even without all of this, the results aren’t exactly impressive. It’s not particularly damning to note that Wuhan search trends for diarrhea were slightly higher in August 2019 than in August 2018, or that a few hospitals in Wuhan had one or two more cars parked there than before. The findings may sound worrying in a headline, but honestly it’s hard to say much from these very tenuous connections. If nothing else, we’d want to check this against a wide range of other controls before making any conclusions — for example, anosmia (losing your sense of smell) is a commonly-reported side-effect of COVID-19, but not so much other conditions. How does this track against the data in the study?

Image for post
Image for post
Stock photo results for “connected” are very uplifting, but also wrong Source: Pexels

At best, we can say that the study found a few vaguely interesting correlations, but reading any more into the results is a bit absurd.

So when did coronavirus start?

Well, it’s hard to say with absolute certainty, but the best evidence overwhelmingly points to late November/early December 2019. There may have been some infections before that, but given how infectious the virus is, it seems most likely that the initial outbreak happened almost exactly as described.

What we can say, with some certainty, is that this evidence doesn’t show that COVID-19 emerged early last year. Even if you ignore the glaring flaws and serious issues with the study, the results are circumstantial and not very convincing.

Conspiracy theories aside, the reality is that there is just no evidence that coronavirus caused a huge outbreak in China in August 2019.

The headlines were wrong.

If you enjoyed, follow me on Medium, Twitter or Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store