Being isolated from home during the COVID-19 outbreak made me recall my days in my final year in high school, Palmerston North Boys’ High School. In our final year, we were lucky enough to do introductory courses through Massey University, one of these courses was a brief dive into statistics.
To most people, statistics is a tedious allotment of numbers in congested tables producing confusing line graphs and bar charts. However, I learnt from my time in high school that statistics is far greater than this. It is a form of critical thinking allowing the understanding of the relationship between numbers and understanding these numbers in relation to the real world.
Statistics is so important and powerful; it is the judge and jury for most discipline in science – the understanding of how we and the world works.
Statistical knowledge is important as it helps you critically analyze the data being laid out in front of you. So, by reading this I hope to empower you with some tools to further your ability to critically think, and if you knew all this already, then please let me know that I am right.
We will use the recent COVID-19 outbreak and how the numbers have been treated by the media, looking at two principles: Simpson’s Paradox and confounding variables.
As a disclaimer, the events of COVID-19 are changing daily. It is likely the data here may change and I would encourage you to check a trusted source like the World Health Organisation (WHO) to keep updated on what is going on. The statistical principles, however, are timeless. Also, thanks to Jason Dhana and John Boyle for curating and collecting data about the cases of COVID19 for New Zealand – it has been incorporated into the featured image for this post.
The mortality rate is 3.4%
On the 3rd of March 2020, the Director-General of WHO – at the time of writing – Dr. Tedros Adhanom Ghebreyesus, announced that the mortality rate caused by the novel coronavirus or 2019-nCoV was 3.4%1.
The mortality rate is simply given by deaths divided by those infected. Multiplying this value by one-hundred gives us a percentage value.
A number like 3.4% would indicate that for every 100 persons, roughly 3 to 4 individuals are predicted to die from this virus.
Can we use this reliable to represent every one of all ages?
Before we look at the data closely, I am going to introduce the principle of Simpson’s Paradox. This usually happens when we combine all the data together; we lose its true message. This can lead to conclusions that are actually the opposite of reality.
A famous example of this is the University of California, Berkeley (UC) being accused of gender bias. The accusation claimed that overall there were more male accepted applications compared to female applications; therefore, male applicants were being unfairly favoured over female applicants. However, when looking closely at admission for each particular faculty, there were higher admissions for female students than male students.
It was a case where the overall admission did not represent the more specific faculty admissions. There were simply more male applicants which caused bias in the data2.
For our COVID-19 example, it’s not extreme of a case where we see a reversal in trend but we do see some misrepresentation of the data and where misunderstandings can take place.
Please note that if you were to total the percentages in this table, it will not add to 100 percent. The reason for this is individual mortality rates are specific to that particular age group. Instead of dividing the total deaths by total infected, we are looking at total deaths for that particular age group divided by those infected in that age group.
We can see the 3.4% overall rate is grossly underestimating the deaths in the older population and vice-versa, it overestimates death in the younger population.
This can result in two incorrect mindsets:
- A younger person may think that they are in the 3 to 4 out of the 100, where their actual chance of death is a lot lower.
- On the flip side – and possibly more importantly – an elderly member could become more complacent, thinking the percentage of 3.4% is quite low, where the actual rate for this age group is higher
We can see that the general mortality rate does not represent everyone. It is important to not generalise data as this ignores factors like age. On top of this, co-morbidity, which is another disease that a person may have reducing life expectancy, such as lung problems, diabetes, and high blood pressure can also increase the chance of mortality from the virus, which is masked by generalisation.
Masks or No Masks
Statistics aims to model the relationship between two variables. However, what may seem an obvious causal link may have an underlying hidden variable at play. An example of this can be related to mask-wearing and the spread of infection.
The authors of the website have diligently highlighted the distinct trajectories of two groups. The group of countries that are considered low mask-wearers had higher rates of infection compared to those countries that instigated high mask-wear.
So, can we conclude that wearing masks will slow the rate of infections?
What we are seeing here is a correlation: high mask wear and low infection rate.
However, as always, we need to take a closer look especially what other methods these high mask-wearing countries took to reduce spread.
Another example is Japan. Japan comparatively tested far less than South Korea. But could fewer tests mean less positive results? This is a question that is difficult to answer. The people of Japan are regarded for their high maintenance of cleanliness, which requires high societal contribution, which could also translate to high compliance when it comes to crisis, but there is no certainty with this either4.
In the end, we cannot really say definitively that wearing masks will reduce infection spread, because there are so many variables at play. We mask-wearing in this case a confounding variable. I am postulating that high use of masks is indicative of widespread public compliance to hygiene and that this widespread public compliance to hygiene, which includes hand washing, isolating if ill, wearing masks, may be the key factor in reducing spread; mask-wearing is just what we see so it is what we think will be the cause.
WHO has left out its recommendation of mask-wearing opting for the following ways to reduce the spread of 2019-nCoV5:
- Wash hand frequently
- Maintain social distancing
- Avoid touching face and mouth
- Practice respiratory hygiene – cough into the arm
- Seek medical care early if you have a fever, cough and difficulty breathing
- Stay informed and follow advice from the healthcare provider
Masks should be worn6:
- if you are healthy, you only need to wear a mask if you are taking care of someone who is suspected of having the 2019-nCoV infection, or
- if you are coughing or sneezing yourself.
- Note that masks are effective when hand washing is performed regularly.
Furthermore, the transmission of 2019-nCoV is similar to that of influenza7.
As current understanding, the 2019-nCoV are released in large droplets (>5-10 μm), where people in close contact (less than 1 m) are exposed. These droplets when they settle on surfaces can result in indirect transmission if another person touches this surface and then their face.
This is in opposition to airborne pathogens, like the bacteria responsible for Tuberculosis8, where these pathogens linger in the air for a long period of time.
This is in accordance with WHO recommendation to wash hands, social distance and limit touching of one’s face.
We must remember that masks are a scarce resource 9.
The economic principle of scarcity means that we have to be educated about how we prioritise our resources, like masks.
WHO’s recommendations, I would consider as ‘big wins’ in reducing infection spread and work well in public health campaigns as this can be practiced without much strain on resources.
This allows proper allocation of resources, allowing hospitals, doctors clinics, and essential workers to be well stocked with masks.
From the data, are we effective in recommending masks, a strained resource, for everyone? Or put our focus into public awareness of handwashing, distancing and limiting face touching allowing masks to be allocated more appropriately?
We covered Simpson’s paradox, where the generalisation of mortality rate for 2019-nCoV hides how the mortality rate is for particular age groups causing misrepresentation.
Finally, the true variable can be masked by another more obvious, confounding variable. It is most likely the social policies the countries adopt and not just mask-wearing alone results in the distinct reduction in infection rates that we are seeing.
I am hoping, day-by-day, that we get over the COVID-19 pandemic and be back to normal life soon. In the meantime, I am hoping these two statistical principles have empowered you with the ability to decipher data better.
We can use these principles and extend this beyond the realm of COVID-19, as we will get out of this soon.
- WHO Director-General’s opening remarks at the media briefing on COVID-19 – 3 March 2020
- Sex Bias in Graduate Admissions: Data from Berkeley
- Coronavirus cases have dropped sharply in South Korea. What’s the secret to its success?
- (COVID-19) Advice for public | WHO
- (COVID-19) advice for the public: When and how to use masks | WHO
- Modes of transmission of virus causing COVID-19: implications for IPC precaution recommendations | WHO
- Chapter 2: Transmission and Pathogenesis of Tuberculosis | Core Curriculum on Tuberculosis: What the Clinician Should Know | Centre of Disease Control and Prevention
- Rational use of face masks in the COVID-19 pandemic | The Lancet