Automatically determining cause of death from verbal autopsy narratives

BackgroundA verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person’s death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone.MethodsAfter preprocessing and spelling correction, our method extracts word frequency counts from the narratives and uses them as input to four different machine learning classifiers: naïve Bayes, random forest, support vector machines, and a neural network.ResultsFor individual CoD classification, our best classifier achieves a sensitivity of.770 for adult deaths for 15 CoD categories (as compared to the current best reported sensitivity of.57), and.662 with 48 WHO categories. When predicting the CoD distribution at the population level, our best classifier achieves.962 cause-specific mortality fraction accuracy for 15 categories and.908 for 48 categories, which is on par with leading CoD distribution estimation methods.ConclusionsOur narrative-based machine learning classifier performs as well as classifiers based on structured data at the individual level. Moreover, our method demonstrates that VA narratives provide important information that can be used by a machine learning system for automated CoD classification. Unlike the structured questionnaire-based methods, this method can be applied to any verbal autopsy dataset, regardless of the collection process or country of origin.

[1]  Alan D. Lopez,et al.  Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards , 2011, Population health metrics.

[2]  K. Maitland,et al.  Bacteremia among children admitted to a rural hospital in Kenya. , 2005, The New England journal of medicine.

[3]  Samuel Danso,et al.  A Comparative Study of Machine Learning Methods for Verbal Autopsy Text Classification , 2014, ArXiv.

[4]  Abraham D Flaxman,et al.  Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies , 2011, Population health metrics.

[5]  Rafael Lozano,et al.  Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies , 2011, Population health metrics.

[6]  Samuel J. Clark,et al.  Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool , 2012, Global health action.

[7]  Ying Lu,et al.  Verbal Autopsy Methods with Multiple Causes of Death , 2008, 0808.0645.

[8]  A. Flaxman,et al.  The WHO 2016 verbal autopsy instrument: An international standard suitable for automated analysis by InterVA, InSilicoVA, and Tariff 2.0 , 2018, PLoS medicine.

[9]  Prabhat Jha,et al.  Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries , 2014, BMC Medicine.

[10]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[11]  R. Dikshit,et al.  Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian Million Death Study , 2014, BMC Medicine.

[12]  A Boulle,et al.  A case study of using artificial neural networks for classifying cause of death from verbal autopsy. , 2001, International journal of epidemiology.

[13]  Alan D. Lopez,et al.  Measuring causes of death in populations: a new metric that corrects cause-specific mortality fractions for chance , 2015, Population Health Metrics.

[14]  C. King,et al.  Measuring the burden of arboviral diseases: the spectrum of morbidity and mortality from four prevalent infections , 2011, Population health metrics.

[15]  R. Peto,et al.  Commentary: verbal autopsy procedure for adult deaths. , 2006, International journal of epidemiology.

[16]  P. Jha Reliable direct measurement of causes of death in low- and middle-income countries , 2014, BMC Medicine.

[17]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[18]  Alan D. Lopez,et al.  Improving performance of the Tariff Method for assigning causes of death to verbal autopsies , 2015, BMC Medicine.

[19]  D. Osrin,et al.  The quality and diagnostic value of open narratives in verbal autopsy: a mixed-methods analysis of partnered interviews from Malawi , 2016, BMC Medical Research Methodology.

[20]  Samuel J. Clark,et al.  Probabilistic Cause-of-Death Assignment Using Verbal Autopsies , 2014, Journal of the American Statistical Association.

[21]  S. Clark,et al.  Profile: Agincourt Health and Socio-demographic Surveillance System , 2012, International journal of epidemiology.

[22]  Alexander Y. Shestopaloff,et al.  Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths , 2015, BMC Medicine.

[23]  T. McCormick,et al.  Automated versus physician assignment of cause of death for verbal autopsies: randomized trial of 9374 deaths in 117 villages in India , 2019, BMC Medicine.

[24]  Samuel Danso,et al.  Linguistic and Statistically Derived Features for Cause of Death Prediction from Verbal Autopsy Text , 2013, GSCL.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[27]  R. Dikshit,et al.  Nationwide Mortality Studies To Quantify Causes Of Death: Relevant Lessons From India's Million Death Study. , 2017, Health affairs.