Automated verbal autopsy classification: using one-against-all ensemble method and Naïve Bayes classifier

Verbal autopsy (VA) deals with post-mortem surveys about deaths, mostly in low and middle income countries, where the majority of deaths occur at home rather than a hospital, for retrospective assignment of causes of death (COD) and subsequently evidence-based health system strengthening. Automated algorithms for VA COD assignment have been developed and their performance has been assessed against physician and clinical diagnoses. Since the performance of automated classification methods remains low, we aimed to enhance the Naïve Bayes Classifier (NBC) algorithm to produce better ranked COD classifications on 26,766 deaths from four globally diverse VA datasets compared to some of the leading VA classification methods, namely Tariff, InterVA-4, InSilicoVA and NBC. We used a different strategy, by training multiple NBC algorithms using the one-against-all approach (OAA-NBC). To compare performance, we computed the cumulative cause-specific mortality fraction (CSMF) accuracies for population-level agreement from rank one to five COD classifications. To assess individual-level COD assignments, cumulative partially-chance corrected concordance (PCCC) and sensitivity was measured for up to five ranked classifications. Overall results show that OAA-NBC consistently assigns CODs that are the most alike physician and clinical COD assignments compared to some of the leading algorithms based on the cumulative CSMF accuracy, PCCC and sensitivity scores. The results demonstrate that our approach improves the performance of classification (sensitivity) by between 6% and 8% compared with other VA algorithms. Population-level agreements for OAA-NBC and NBC were found to be similar or higher than the other algorithms used in the experiments. Although OAA-NBC still requires improvement for individual-level COD assignment, the one-against-all approach improved its ability to assign CODs that more closely resemble physician or clinical COD classifications compared to some of the other leading VA classifiers.

[1]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[2]  Alan D. Lopez,et al.  Improving performance of the Tariff Method for assigning causes of death to verbal autopsies , 2015, BMC Medicine.

[3]  Alexander Y. Shestopaloff,et al.  Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths , 2015, BMC Medicine.

[4]  Abdelwahab Hamou-Lhadj,et al.  Identifying Recurring Faulty Functions in Field Traces of a Large Industrial Software System , 2015, IEEE Transactions on Reliability.

[5]  Samuel J. Clark,et al.  Probabilistic Cause-of-Death Assignment Using Verbal Autopsies , 2014, Journal of the American Statistical Association.

[6]  Abdelwahab Hamou-Lhadj,et al.  An empirical study on the use of mutant traces for diagnosis of faults in deployed systems , 2014, J. Syst. Softw..

[7]  P. Jha Reliable direct measurement of causes of death in low- and middle-income countries , 2014, BMC Medicine.

[8]  P. Byass Usefulness of the Population Health Metrics Research Consortium gold standard verbal autopsy data for general verbal autopsy methods , 2014, BMC Medicine.

[9]  Prabhat Jha,et al.  Performance of four computer-coded verbal autopsy methods for cause of death assignment compared with physician coding on 24,000 deaths in low- and middle-income countries , 2014, BMC Medicine.

[10]  M. Garenne Prospects for automated diagnosis of verbal autopsies , 2014, BMC Medicine.

[11]  P. Byass,et al.  Comparison of physician-certified verbal autopsy with computer-coded verbal autopsy for cause of death assignment in hospitalized patients in low- and middle-income countries: systematic review , 2014, BMC Medicine.

[12]  R. Dikshit,et al.  Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian Million Death Study , 2014, BMC Medicine.

[13]  Ian Riley,et al.  Using verbal autopsy to measure causes of death: the comparative performance of existing methods , 2014, BMC Medicine.

[14]  S. Clark,et al.  Profile: Agincourt Health and Socio-demographic Surveillance System , 2012, International journal of epidemiology.

[15]  Bhavani M. Thuraisingham,et al.  Effective Software Fault Localization Using an RBF Neural Network , 2012, IEEE Transactions on Reliability.

[16]  Rajendra Prasad,et al.  Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets , 2011, Population health metrics.

[17]  Rafael Lozano,et al.  Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies , 2011, Population health metrics.

[18]  Abraham D Flaxman,et al.  Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies , 2011, Population health metrics.

[19]  Sean T. Green,et al.  Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards , 2011, Population health metrics.

[20]  P. Byass,et al.  Verbal autopsy: methods in transition. , 2010, Epidemiologic reviews.

[21]  Kemal Polat,et al.  A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems , 2009, Expert Syst. Appl..

[22]  Ying Lu,et al.  Verbal Autopsy Methods with Multiple Causes of Death , 2008, 0808.0645.

[23]  Prabhat Jha,et al.  Sample registration of vital events with verbal autopsy: a renewed commitment to measuring and monitoring vital statistics. , 2005, Bulletin of the World Health Organization.

[24]  Peter Byass,et al.  A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam , 2003, Scandinavian journal of public health. Supplement.

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[26]  S. S. Alam,et al.  Health and Demographic Surveillance System–Matlab, v. 51. Registration of health and demographic events 2016 , 2018 .

[27]  Un Desa Transforming our world : The 2030 Agenda for Sustainable Development , 2016 .

[28]  อนิรุธ สืบสิงห์ Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[29]  P. Byass,et al.  Strengthening standardised interpretation of verbal autopsy data: the new InterVA-4 tool. , 2012, Global health action.

[30]  Maya R. Gupta,et al.  Introduction to the Dirichlet Distribution and Related Processes , 2010 .