Simpson's Paradox in COVID-19 Case Fatality Rates: A Mediation Analysis of Age-Related Causal Effects

We point out an instantiation of Simpson's paradox in COVID-19 case fatality rates (cfrs): comparing a large-scale study from China (February 17) with early reports from Italy (March 9), we find that cfrs are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify different direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and case fatality. We curate an age-stratified cfr dataset with $>$750 k cases and conduct a case study, investigating total, direct, and indirect (age-mediated) causal effects between different countries and at different points in time. This allows us to separate age-related effects from others unrelated to age and facilitates a more transparent comparison of cfrs across countries at different stages of the COVID-19 pandemic. Using longitudinal data from Italy, we discover a sign reversal of the direct causal effect in mid-March, which temporally aligns with the reported collapse of the healthcare system in parts of the country. Moreover, we find that direct and indirect effects across 132 pairs of countries are only weakly correlated, suggesting that a country's policy and case demographic may be largely unrelated. We point out limitations and extensions for future work, and finally, discuss the role of causal reasoning in the broader context of using AI to combat the COVID-19 pandemic. Impact Statement—During a global pandemic, understanding the causal effects of risk factors such as age on COVID-19 fatality is an important scientific question. Since randomised controlled trials are typically infeasible or unethical in this context, causal investigations based on observational data—such as the one carried out in this article—will, therefore, be crucial in guiding our understanding of the available data. Causal inference, in particular mediation analysis, can be used to resolve apparent statistical paradoxes; help educate the public and decision-makers alike; avoid unsound comparisons; and answer a range of causal questions pertaining to the pandemic, subject to transparently stated assumptions. Our exposition helps clarify how mediation analysis can be used to investigate direct and indirect effects along different causal paths and thus serves as a stepping stone for future studies of other important risk factors for COVID-19 besides age.

[1]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[2]  D. Clayton,et al.  The Simpson's paradox unraveled. , 2011, International journal of epidemiology.

[3]  J. Crowcroft,et al.  Leveraging Data Science to Combat COVID-19: A Comprehensive Review , 2020, IEEE Transactions on Artificial Intelligence.

[4]  G. Heinze,et al.  Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal , 2020, BMJ.

[5]  Elias Bareinboim,et al.  Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.

[6]  Umut Ozkaya,et al.  Coronavirus (Covid-19) Classification Using CT Images by Machine Learning Methods , 2020, RTA-CSIT.

[7]  Samuel Lalmuanawma,et al.  Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review , 2020, Chaos, Solitons & Fractals.

[8]  Xiaolong Qi,et al.  Real estimates of mortality following COVID-19 infection , 2020, The Lancet Infectious Diseases.

[9]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[10]  S. Lauer,et al.  Serology-informed estimates of SARS-COV-2 infection fatality risk in Geneva, Switzerland , 2020, medRxiv.

[11]  Carl A. B. Pearson,et al.  The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study , 2020, The Lancet Public Health.

[12]  C. Bayer,et al.  Intergenerational Ties and Case Fatality Rates: A Cross-Country Analysis , 2020, SSRN Electronic Journal.

[13]  Leo Anthony Celi,et al.  Real-time prediction of COVID-19 related mortality using electronic health records , 2020, Nature Communications.

[14]  Jin Tian,et al.  Adjustment Criteria for Generalizing Experimental Findings , 2019, ICML.

[15]  Eduardo Missoni,et al.  The Italian health system and the COVID-19 challenge , 2020, The Lancet Public Health.

[16]  Mei U Wong,et al.  COVID-19 Coronavirus Vaccine Design Using Reverse Vaccinology and Machine Learning , 2020, bioRxiv.

[17]  V. Demicheli,et al.  The early phase of the COVID-19 outbreak in Lombardy, Italy , 2020, 2003.09320.

[18]  R. Mikolajczyk,et al.  Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases , 2008, PLoS medicine.

[19]  A. Vespignani,et al.  Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China , 2020, Science.

[20]  Ernest Nagel,et al.  An Introduction to Logic and Scientific Method , 1934, Nature.

[21]  D. Braddon-Mitchell NATURE'S CAPACITIES AND THEIR MEASUREMENT , 1991 .

[22]  Yoshua Bengio,et al.  COVI White Paper , 2020, ArXiv.

[23]  S. Merler,et al.  Age-specific SARS-CoV-2 infection fatality ratio and associated risk factors, Italy, February to April 2020 , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[24]  B. Schölkopf,et al.  Assaying Large-scale Testing Models to Interpret Covid-19 Case Numbers. A Cross-country Study , 2020, 2012.01912.

[25]  C. Whittaker,et al.  Estimates of the severity of coronavirus disease 2019: a model-based analysis , 2020, The Lancet Infectious Diseases.

[26]  Ahmed M. Alaa,et al.  How artificial intelligence and machine learning can help healthcare systems respond to COVID-19 , 2020, Machine Learning.

[27]  The Lancet Digital Health Artificial intelligence for COVID-19: saviour or saboteur? , 2020, The Lancet Digital Health.

[28]  Jin Tian,et al.  Recovering Causal Effects from Selection Bias , 2015, AAAI.

[29]  A. M. Leontovich,et al.  The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 , 2020, Nature Microbiology.

[30]  Eric Neufeld,et al.  SIMPSON'S PARADOX IN ARTIFICIAL INTELLIGENCE AND IN REAL LIFE , 1995, Comput. Intell..

[31]  D. Rajgor,et al.  The many estimates of the COVID-19 case fatality rate , 2020, The Lancet Infectious Diseases.

[32]  Silvia Chiappa,et al.  Path-Specific Counterfactual Fairness , 2018, AAAI.

[33]  Derek Abbott,et al.  A REVIEW OF PARRONDO'S PARADOX , 2002 .

[34]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[35]  Christan Grant,et al.  Detecting Simpson's Paradox , 2018, FLAIRS.

[36]  O. Penrose The Direction of Time , 1962 .

[37]  Zunyou Wu,et al.  Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention. , 2020, JAMA.

[38]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[39]  Illtyd Trethowan Causality , 1938 .

[40]  R. Eggo,et al.  Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020 , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[41]  Andrew M. Rockett,et al.  Parrondo's paradox , 2003 .

[42]  J. Pearl Comment: Understanding Simpson’s Paradox , 2013, Probabilistic and Causal Inference.

[43]  Mohamed Abd Elaziz,et al.  New machine learning method for image-based diagnosis of COVID-19 , 2020, PloS one.

[44]  Bernhard Schölkopf,et al.  PanCast: Listening to Bluetooth Beacons for Epidemic Risk Mitigation , 2020, ArXiv.

[45]  J. Pearl Understanding Simpson's Paradox , 2013 .

[46]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[47]  V. Chernozhukov,et al.  Causal impact of masks, policies, behavior on early covid-19 pandemic in the U.S. , 2020, Journal of Econometrics.

[48]  D. Clifton,et al.  Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test , 2020, The Lancet Digital Health.

[49]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[50]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[51]  Hanghang Tong,et al.  PC-Fairness: A Unified Framework for Measuring Causality-based Fairness , 2019, NeurIPS.

[52]  J. Sterne,et al.  Collider bias undermines our understanding of COVID-19 disease risk and severity , 2020, Nature Communications.

[53]  Harold I. Brown,et al.  Nature’s Capacities and their Measurement , 1991 .

[54]  M. Paradisi,et al.  An empirical estimate of the infection fatality rate of COVID-19 from the first Italian outbreak , 2020, medRxiv.

[55]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[56]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[57]  Elias Bareinboim,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[58]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[59]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.

[60]  Elias Bareinboim,et al.  Counterfactual Data-Fusion for Online Reinforcement Learners , 2017, ICML.