Machine learning in the social and health sciences

The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, and research using ML for social and health research questions remains fragmented. This may be due to the separate development of research in the computational/data versus social and health sciences as well as a lack of accessible overviews and adequate training in ML techniques for non data science researchers. This paper provides a meta-mapping of research questions in the social and health sciences to appropriate ML approaches, by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, and causal inference to common research goals, such as estimating prevalence of adverse health or social outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes. This meta-mapping aims at overcoming disciplinary barriers and starting a fluid dialogue between researchers from the social and health sciences and methodologically trained researchers. Such mapping may also help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences, and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research. Significance statement There is great interest in the social and health sciences in the application of machine learning (ML) methods, however, a conceptual mapping of appropriate ML approaches to research questions in the social and health sciences has been lacking. The classification presented here may help to advance the uptake of ML in social and health sciences while also pointing to possible limitations and ways of addressing them. ML in the Social and Health Sciences

[1]  Jennifer L. Hill,et al.  Examining treatment effect heterogeneity using BART , 2021, Observational Studies.

[2]  M. Petersen,et al.  Machine Learning in Causal Inference: How do I love thee? Let me count the ways. , 2021, American journal of epidemiology.

[3]  Edward I. George,et al.  Spike-and-slab Lasso biclustering , 2021 .

[4]  B. Recht,et al.  Patterns, predictions, and actions: A story about machine learning , 2021, ArXiv.

[5]  Jessica G. Young,et al.  Separating Algorithms from Questions and Causal Inference with Unmeasured Exposures: An Application to Birth Cohort Studies of Early BMI Rebound. , 2021, American journal of epidemiology.

[6]  J. Thornton,et al.  Data-driven identification of ageing-related diseases from electronic health records , 2021, Scientific Reports.

[7]  Luca Oneto,et al.  Fairness in Machine Learning , 2020, INNSBDDL.

[8]  Kellyn F Arnold,et al.  Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations , 2020, International journal of epidemiology.

[9]  D. Llewellyn,et al.  Identifying key features for dementia diagnosis using machine learning , 2020 .

[10]  B. Reuter,et al.  Identifying CBT non-response among OCD outpatients: A machine-learning approach , 2020, Psychotherapy research : journal of the Society for Psychotherapy Research.

[11]  E. Ware,et al.  A data-driven prospective study of dementia among older adults in the United States , 2020, PloS one.

[12]  B. Woll,et al.  A Multi-modal Machine Learning Approach and Toolkit to Automate Recognition of Early Stages of Dementia among British Sign Language Users , 2020, ECCV Workshops.

[13]  Marzyeh Ghassemi,et al.  Ethical Machine Learning in Health Care , 2020, Annual review of biomedical data science.

[14]  F. Tylavsky,et al.  Identification of Modifiable Social and Behavioral Factors Associated With Childhood Cognitive Performance. , 2020, JAMA pediatrics.

[15]  Seong-Hoon Hwang,et al.  Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach , 2020 .

[16]  Ciarán M Lee,et al.  Improving the accuracy of medical diagnosis with causal machine learning , 2020, Nature Communications.

[17]  S. Vansteelandt,et al.  The obesity paradox in critically ill patients: a causal learning approach to a casual finding , 2020, Critical Care.

[18]  J. Walker,et al.  Using gradient boosting with stability selection on health insurance claims data to identify disease trajectories in chronic obstructive pulmonary disease , 2020, Statistical methods in medical research.

[19]  Mohammad Asif Emon,et al.  Differences in cohort study data affect external validation of artificial intelligence models for predictive diagnostics of dementia - lessons for translation into clinical practice , 2020, EPMA Journal.

[20]  Klaus P. Ebmeier,et al.  Prediction of brain age and cognitive age: Quantifying brain and cognitive maintenance in aging , 2020, Human brain mapping.

[21]  Nicholas C. Firth,et al.  Sequences of cognitive decline in typical Alzheimer's disease and posterior cortical atrophy estimated using a novel event‐based model of disease progression , 2020, Alzheimer's & dementia : the journal of the Alzheimer's Association.

[22]  Klaus P. Ebmeier,et al.  Association of trajectories of depressive symptoms with vascular risk, cognitive function and adverse brain outcomes: The Whitehall II MRI sub-study , 2020, medRxiv.

[23]  Yang Liu,et al.  Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data , 2020, Int. J. Approx. Reason..

[24]  Stephen R. Aichele,et al.  Predicting Cognitive Impairment and Dementia: A Machine Learning Approach. , 2020, Journal of Alzheimer's disease : JAD.

[25]  T. Wiemken,et al.  Machine Learning in Epidemiology and Health Outcomes Research. , 2020, Annual review of public health.

[26]  Sherri Rose Intersections of machine learning and epidemiological methods for health services research , 2020, International journal of epidemiology.

[27]  Jared S. Murray,et al.  Bayesian Additive Regression Trees: A Review and Look Forward , 2020, Annual Review of Statistics and Its Application.

[28]  Abigail R Cartus,et al.  Machine learning as a strategy to account for dietary synergy: an illustration based on dietary intake and adverse pregnancy outcomes. , 2020, The American journal of clinical nutrition.

[29]  G. Sanguinetti,et al.  Robustness of Bayesian Neural Networks to Gradient-Based Attacks , 2020, NeurIPS.

[30]  Nick C Fox,et al.  The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up , 2020, Machine Learning for Biomedical Imaging.

[31]  Ellicott C Matthay,et al.  A Graphical Catalog of Threats to Validity , 2020, Epidemiology.

[32]  Lorenz Kemper,et al.  Predicting student dropout: A machine learning approach , 2020, European Journal of Higher Education.

[33]  K. Yaffe,et al.  Approximating dementia prevalence in population‐based surveys of aging worldwide: An unsupervised machine learning approach , 2020, Alzheimer's & dementia.

[34]  Jaime Delgadillo,et al.  Targeted prescription of cognitive-behavioral therapy versus person-centered counseling for depression using a machine learning approach. , 2020, Journal of consulting and clinical psychology.

[35]  Justin Lessler,et al.  What Is Machine Learning: a Primer for the Epidemiologist. , 2019, American journal of epidemiology.

[36]  N. Kathmann,et al.  Predicting cognitive behavioral therapy outcome in the outpatient sector based on clinical routine data: A machine learning approach. , 2019, Behaviour research and therapy.

[37]  Ellicott C. Matthay,et al.  Alternative causal inference methods in population health research: Evaluating tradeoffs and triangulating evidence , 2019, SSM - population health.

[38]  Audrey Renson,et al.  Teaching yourself about structural racism will improve your machine learning. , 2019, Biostatistics.

[39]  Uri Shalit,et al.  Can we learn individual-level treatment policies from clinical data? , 2019, Biostatistics.

[40]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[41]  Valerio Baćak,et al.  Principled Machine Learning Using the Super Learner: An Application to Predicting Prison Violence , 2019 .

[42]  Tony Blakely,et al.  Reflection on modern methods: when worlds collide-prediction, machine learning and causal inference. , 2019, International journal of epidemiology.

[43]  D. Facal,et al.  Machine learning approaches to studying the role of cognitive reserve in conversion from mild cognitive impairment to dementia , 2019, International journal of geriatric psychiatry.

[44]  Alzheimer's Disease Neuroimaging Initiative,et al.  Development and Validation of a Dementia Risk Prediction Model in the General Population: An Analysis of Three Longitudinal Studies. , 2019, The American journal of psychiatry.

[45]  Fabio Stella,et al.  A survey on Bayesian network structure learning from data , 2019, Progress in Artificial Intelligence.

[46]  J. H. Rudd,et al.  Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants , 2019, PloS one.

[47]  Tetsuji Katayama,et al.  Modifiable Lifestyle Factors and Cognitive Function in Older People: A Cross-Sectional Observational Study , 2019, Front. Neurol..

[48]  Hadi Kharrazi,et al.  Exploring the use of machine learning for risk adjustment: A comparison of standard and penalized linear regression models in predicting health care costs in older adults , 2019, PloS one.

[49]  Jie Ma,et al.  A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. , 2019, Journal of clinical epidemiology.

[50]  Nicole Bohme Carnegie,et al.  Variable Selection and Parameter Tuning for BART Modeling in the Fragile Families Challenge , 2019, Socius: Sociological Research for a Dynamic World.

[51]  M. Howell,et al.  Ensuring Fairness in Machine Learning to Advance Health Equity , 2018, Annals of Internal Medicine.

[52]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[53]  J. Whitwell,et al.  Alzheimer's disease neuroimaging , 2018, Current opinion in neurology.

[54]  M. van der Schaar,et al.  Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning , 2018, Scientific Reports.

[55]  Eleonore Bayen,et al.  Unsupervised Machine Learning to Identify High Likelihood of Dementia in Population-Based Surveys: Development and Validation Study , 2018, Journal of medical Internet research.

[56]  A. Kaufman,et al.  Targeted Estimation of the Relationship Between Childhood Adversity and Fluid Intelligence in a US Population Sample of Adolescents , 2018, American journal of epidemiology.

[57]  Bernd Bischl,et al.  iml: An R package for Interpretable Machine Learning , 2018, J. Open Source Softw..

[58]  Dylan S. Small,et al.  Comparing Covariate Prioritization via Matching to Machine Learning Methods for Causal Inference Using Five Empirical Applications , 2018, The American Statistician.

[59]  John Hsu,et al.  A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks , 2018, CHANCE.

[60]  Stephen J Mooney,et al.  Big Data in Public Health: Terminology, Machine Learning, and Privacy. , 2018, Annual review of public health.

[61]  Paolo Brunori,et al.  The Roots of Inequality: Estimating Inequality of Opportunity from Regression Trees , 2018 .

[62]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[63]  Stéphane P. A. Bordas,et al.  What makes Data Science different? A discussion involving Statistics2.0 and Computational Sciences , 2018, International Journal of Data Science and Analytics.

[64]  M. Boustani,et al.  Ongoing Medical Management to Maximize Health and Well-being for Persons Living With Dementia , 2018, The Gerontologist.

[65]  Susan Athey,et al.  The Impact of Machine Learning on Economics , 2018, The Economics of Artificial Intelligence.

[66]  Shripad Tuljapurkar,et al.  Machine learning approaches to the social determinants of health in the health and retirement study , 2017, SSM - population health.

[67]  Samuel J Clark,et al.  Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies. , 2017, Bayesian analysis.

[68]  Pedro Rosa-Neto,et al.  Identifying incipient dementia individuals using machine learning and amyloid imaging , 2017, Neurobiology of Aging.

[69]  Kevin G. Stanley,et al.  A glossary for big data in population and public health: discussion and commentary on terminology and research methods , 2017, Journal of Epidemiology & Community Health.

[70]  P. A. Bradley,et al.  Developing a practical suicide risk prediction model for targeting high‐risk patients in the Veterans health Administration , 2017, International journal of methods in psychiatric research.

[71]  Hong-Woo Chun,et al.  Longitudinal Study-Based Dementia Prediction for Public Health , 2017, International journal of environmental research and public health.

[72]  T. Yarkoni,et al.  Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning , 2017, Perspectives on psychological science : a journal of the Association for Psychological Science.

[73]  Ashley I. Naimi,et al.  Stacked generalization: an introduction to super learning , 2017, bioRxiv.

[74]  Dionysis Goularas,et al.  Distinguishing age-related cognitive decline from dementias: A study based on machine learning algorithms , 2017, Journal of Clinical Neuroscience.

[75]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[76]  Sara C. Madeira,et al.  Predicting progression of mild cognitive impairment to dementia using neuropsychological data: a supervised learning approach using time windows , 2017, BMC Medical Informatics and Decision Making.

[77]  Ramon Casanova,et al.  INVESTIGATING PREDICTORS OF COGNITIVE DECLINE USING MACHINE LEARNING , 2017, Alzheimer's & Dementia.

[78]  Christina Heinze-Deml,et al.  Causal Structure Learning , 2017, 1706.09141.

[79]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[80]  M. Hernán,et al.  The value of explicitly emulating a target trial when using real world evidence: an application to colorectal cancer screening , 2017, European Journal of Epidemiology.

[81]  Ricardo J. G. B. Campello,et al.  A systematic comparative evaluation of biclustering techniques , 2017, BMC Bioinformatics.

[82]  M. J. van der Laan,et al.  Racial/Ethnic Differences in the Role of Childhood Adversities for Mental Disorders Among a Nationally Representative Sample of Adolescents , 2016, Epidemiology.

[83]  Bo Shen,et al.  MDBSCAN: Multi-level Density Based Spatial Clustering of Applications with Noise , 2016, KMO.

[84]  Masataka Harada,et al.  A flexible, interpretable framework for assessing sensitivity to unmeasured confounding , 2016, Statistics in medicine.

[85]  Achim Zeileis,et al.  Model-Based Recursive Partitioning for Subgroup Analyses , 2016, The international journal of biostatistics.

[86]  James M Robins,et al.  Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. , 2016, American journal of epidemiology.

[87]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[88]  Laura B. Balzer,et al.  The roles of outlet density and norms in alcohol use disorder. , 2015, Drug and alcohol dependence.

[89]  Svetha Venkatesh,et al.  Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset , 2015, PloS one.

[90]  Reza Ebrahimpour,et al.  Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.

[91]  Nick C Fox,et al.  A data-driven model of biomarker changes in sporadic Alzheimer's disease , 2014, Alzheimer's & Dementia.

[92]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[93]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[94]  Adam Kapelner,et al.  bartMachine: Machine Learning with Bayesian Additive Regression Trees , 2013, 1312.2171.

[95]  Jennifer L. Hill,et al.  Assessing lack of common support in causal inference using bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children's cognitive outcomes , 2013, 1311.7244.

[96]  A. Simmons,et al.  Different multivariate techniques for automated classification of MRI data in Alzheimer’s disease and mild cognitive impairment , 2013, Psychiatry Research: Neuroimaging.

[97]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[98]  S. Rose Mortality risk score prediction in an elderly population using machine learning. , 2013, American journal of epidemiology.

[99]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[100]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[101]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[102]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[103]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[104]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[105]  Robert M. Groves,et al.  Using proxy measures and other correlates of survey outcomes to adjust for non‐response: examples from multiple surveys , 2010 .

[106]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[107]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[108]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[109]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[110]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[111]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[112]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[113]  K. Anstey,et al.  Selective non‐response to clinical assessment in the longitudinal study of aging: implications for estimating population levels of cognitive function and dementia , 2002, International journal of geriatric psychiatry.

[114]  J. Friedman Stochastic gradient boosting , 2002 .

[115]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[116]  Payman Arabshahi,et al.  Fundamentals of Artificial Neural Networks [Book Reviews] , 1996, IEEE Transactions on Neural Networks.

[117]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[118]  Junfeng Jiao,et al.  Predicting and mapping neighborhood-scale health outcomes: A machine learning approach , 2021, Comput. Environ. Urban Syst..

[119]  E. LeDell,et al.  H2O AutoML: Scalable Automatic Machine Learning , 2020 .

[120]  M. Power,et al.  Development of algorithmic dementia ascertainment for racial/ethnic disparities research in the U.S. Health and Retirement Study. , 2019, Epidemiology.

[121]  Megan Kurka,et al.  Machine Learning Interpretability with H2O Driverless AI , 2019 .

[122]  Mihaela van der Schaar,et al.  Demystifying Black-box Models with Symbolic Metamodels , 2019, NeurIPS.

[123]  Alexander Galozy,et al.  Towards Understanding ICU Procedures using Similarities in Patient Trajectories : An exploratory study on the MIMIC-III intensive care database , 2018 .

[124]  Sherri Rose,et al.  Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies , 2017, American journal of epidemiology.

[125]  S. R. Bhagyashree,et al.  Diagnosis of Dementia by Machine learning methods in Epidemiological studies: a pilot exploratory study from south India , 2017, Social Psychiatry and Psychiatric Epidemiology.

[126]  Brandon M. Greenwell pdp: An R Package for Constructing Partial Dependence Plots , 2017, R J..

[127]  M. Glymour,et al.  Evaluating Public Health Interventions: 5. Causal Inference in Public Health Research-Do Sex, Race, and Biological Factors Cause Health Outcomes? , 2017, American journal of public health.

[128]  H. Soininen,et al.  Generalizability of the disease state index prediction model for identifying patients progressing from mild cognitive impairment to Alzheimer's disease. , 2015, Journal of Alzheimer's disease : JAD.

[129]  Kurt Hornik,et al.  party with the mob : Model-Based Recursive Partitioning in R , 2009 .

[130]  M. Glymour,et al.  USING CAUSAL DIAGRAMS TO UNDERSTAND COMMON PROBLEMS IN SOCIAL EPIDEMIOLOGY , 2006 .

[131]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[132]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[133]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .