Explainable statistical learning in public health for policy development: the case of real-world suicide data

In recent years, the availability of publicly available data related to public health has significantly increased. These data have substantial potential to develop public health policy; however, this requires meaningful and insightful analysis. Our aim is to demonstrate how data analysis techniques can be used to address the issues of data reduction, prediction and explanation using online available public health data, in order to provide a sound basis for informing public health policy. Observational suicide prevention data were analysed from an existing online United Kingdom national public health database. Multi-collinearity analysis and principal-component analysis were used to reduce correlated data, followed by regression analyses for prediction and explanation of suicide. Multi-collinearity analysis was effective in reducing the indicator set of predictors by 30% and principal component analysis further reduced the set by 86%. Regression for prediction identified four significant indicator predictors of suicide behaviour (emergency hospital admissions for intentional self-harm, children leaving care, statutory homelessness and self-reported well-being/low happiness) and two main component predictors (relatedness dysfunction, and behavioural problems and mental illness). Regression for explanation identified significant moderation of a well-being predictor (low happiness) of suicide behaviour by a social factor (living alone), thereby supporting existing theory and providing insight beyond the results of regression for prediction. Two independent predictors capturing relatedness needs in social care service delivery were also identified. We demonstrate the effectiveness of regression techniques in the analysis of online public health data. Regression analysis for prediction and explanation can both be appropriate for public health data analysis for a better understanding of public health outcomes. It is therefore essential to clarify the aim of the analysis (prediction accuracy or theory development) as a basis for choosing the most appropriate model. We apply these techniques to the analysis of suicide data; however, we argue that the analysis presented in this study should be applied to datasets across public health in order to improve the quality of health policy recommendations.

[1]  Shawn Dolley,et al.  Big Data’s Role in Precision Public Health , 2018, Front. Public Health.

[2]  R. Cordier,et al.  A narrative review of Men's Sheds literature: reducing social isolation and promoting men's health and well-being. , 2013, Health & social care in the community.

[3]  Renee F Wilson,et al.  Data Linkage Strategies to Advance Youth Suicide Prevention: A Systematic Review for a National Institutes of Health Pathways to Prevention Workshop , 2016, Annals of Internal Medicine.

[4]  S. Marshall,et al.  Progressive statistics for studies in sports medicine and exercise science. , 2009, Medicine and science in sports and exercise.

[5]  A. Diez-Roux Multilevel analysis in public health research. , 2000, Annual review of public health.

[6]  J. Kruschke Doing Bayesian Data Analysis , 2010 .

[7]  G. Cumming,et al.  The New Statistics , 2014, Psychological science.

[8]  Kennon M. Sheldon,et al.  Integrating behavioral-motive and experiential-requirement perspectives on psychological needs: a two process model. , 2011, Psychological review.

[9]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[10]  Robert Kabacoff,et al.  R in Action , 2011 .

[11]  Elazar J. Pedhazur,et al.  Measurement, Design, and Analysis: An Integrated Approach , 1994 .

[12]  D. Nagesh Kumar,et al.  An empirical model to predict arsenic pollution affected life expectancy , 2014 .

[13]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[14]  B. Massoudi,et al.  Public Health, Population Health, and Epidemiology Informatics: Recent Research and Trends in the United States. , 2017, Yearbook of medical informatics.

[15]  E. Hennessy,et al.  Is breastfeeding in infancy predictive of child mental well-being and protective against obesity at 9 years of age? , 2014, Child: care, health and development.

[16]  M. Hassali,et al.  Building intentions with the Theory of Planned Behaviour: the mediating role of knowledge and expectations in implementing new pharmaceutical services in Malaysia , 2016, Pharmacy practice.

[17]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[18]  D. Hoaglin,et al.  Predicting waist circumference from body mass index , 2012, BMC Medical Research Methodology.

[19]  Yu-Kang Tu,et al.  Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox , 2008, Emerging themes in epidemiology.

[20]  John P. A. Ioannidis,et al.  Big data meets public health , 2014, Science.

[21]  W. Hays Using Multivariate Statistics , 1983 .

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23]  Evan M. Kleiman,et al.  Risk Factors for Suicidal Thoughts and Behaviors: A Meta-Analysis of 50 Years of Research , 2017, Psychological bulletin.

[24]  A. Hayes Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach , 2013 .

[25]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[26]  B. Cooper,et al.  Variable performance of models for predicting methicillin-resistant Staphylococcus aureus carriage in European surgical wards , 2015, BMC Infectious Diseases.

[27]  Renee F Wilson,et al.  The study of effect moderation in youth suicide-prevention studies , 2018, Social Psychiatry and Psychiatric Epidemiology.

[28]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[29]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[30]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[31]  Andy P. Field,et al.  Discovering Statistics Using Ibm Spss Statistics , 2017 .

[32]  Soo Beom Choi,et al.  Risk factors of suicide attempt among people with suicidal ideation in South Korea: a cross-sectional study , 2017, BMC Public Health.

[33]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[34]  E. Pedhazur Multiple Regression in Behavioral Research: Explanation and Prediction , 1982 .

[35]  Hadi Kharrazi,et al.  Public and Population Health Informatics: The Bridging of Big Data to Benefit Communities , 2018, Yearbook of Medical Informatics.

[36]  D. Lobdell,et al.  Construction of an environmental quality index for public health research , 2014, Environmental Health.

[37]  Hadi Kharrazi,et al.  What’s Past is Prologue: A Scoping Review of Recent Public Health and Global Health Informatics Literature , 2015, Online journal of public health informatics.

[38]  F. Pueyo,et al.  The influence of (public) health expenditure on longevity , 2014, International Journal of Public Health.

[39]  Brett Myors,et al.  Testing the hypothesis that treatments have negligible effects : Minimum-effect tests in the general linear model , 1999 .

[40]  D. Mackinnon Introduction to Statistical Mediation Analysis , 2008 .

[41]  John W. Loonsk,et al.  A proposed national research and development agenda for population health informatics: summary recommendations from a national expert workshop , 2017, J. Am. Medical Informatics Assoc..

[42]  A. Field Discovering statistics using IBM SPSS statistics, 5th edition , 2017 .

[43]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[44]  S. Michie,et al.  Behaviour change theory and evidence: a presentation to Government , 2013 .

[45]  M. Bardsley,et al.  Untapped potential: investing in health and care data analytics , 2019 .

[46]  Rayid Ghani,et al.  Big Data and Social Science: A Practical Guide to Methods and Tools , 2016 .

[47]  V. Carli,et al.  The interpersonal theory of suicide and adolescent suicidal behavior. , 2015, Journal of affective disorders.

[48]  IAN FOSTER, RAYID GHANI, RON S. JARMIN, FRAUKE KREUTER, JULIA LANE. Big Data and Social Science: A Practical Guide to Methods and Tools. Boca Raton: CRC Press. , 2018, Biometrics.

[49]  Robert B. Penfold,et al.  Predicting Suicide Attempts and Suicide Deaths Following Outpatient Visits Using Electronic Health Records. , 2018, The American journal of psychiatry.

[50]  Patrick B. Ryan,et al.  Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data , 2018, J. Am. Medical Informatics Assoc..