A Permutation Approach to Assess Confounding in Machine Learning Applications for Digital Health

Machine learning applications are often plagued with confounders that can impact the generalizability of the learners. In clinical settings, demographic characteristics often play the role of confounders. Confounding is especially problematic in remote digital health studies where the participants self-select to enter the study, thereby making it difficult to balance the demographic characteristics of participants. One effective approach to combat confounding is to match samples with respect to the confounding variables in order to improve the balance of the data. This procedure, however, leads to smaller datasets and hence negatively impact the inferences drawn from the learners. Alternatively, confounding adjustment methods that make more efficient use of the data (such as inverse probability weighting) usually rely on modeling assumptions, and it is unclear how robust these methods are to violations of these assumptions. Here, instead of proposing a new method to control for confounding, we develop novel permutation based statistical tools to detect and quantify the influence of observed confounders, and estimate the unconfounded performance of the learner. Our tools can be used to evaluate the effectiveness of existing confounding adjustment methods. We evaluate the statistical properties of our methods in a simulation study, and illustrate their application using real-life data from a Parkinson's disease mobile health study collected in an uncontrolled environment.

[1]  G. Wahba,et al.  Multivariate Bernoulli distribution , 2012, 1206.1874.

[2]  Janaina Mourão Miranda,et al.  Predictive modelling using neuroimaging data in the presence of confounds , 2017, NeuroImage.

[3]  Virgile Landeiro,et al.  Robust Text Classification in the Presence of Confounding Bias , 2016, AAAI.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  N. Graham,et al.  Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation , 2002 .

[6]  E. L. Lehmann,et al.  Consistency and Unbiasedness of Certain Nonparametric Tests , 1951 .

[7]  Jeffrey A. Golden,et al.  Deep Learning Algorithms for Detection of Lymph Node Metastases From Breast Cancer: Helping Artificial Intelligence Be Seen. , 2017, JAMA.

[8]  S. Friend,et al.  The mPower study, Parkinson disease mobile data collected using ResearchKit , 2016, Scientific Data.

[9]  Karsten M. Borgwardt,et al.  ccSVM: correcting Support Vector Machines for confounding factors in biological data classification , 2011, Bioinform..

[10]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[11]  E. Dorsey,et al.  Smartphones as new tools in the management and understanding of Parkinson’s disease , 2016, npj Parkinson's Disease.

[12]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[13]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[15]  M. J. van der Laan,et al.  Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner , 2015 .

[16]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[17]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[18]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[19]  Jeffrey M. Hausdorff,et al.  Sex-specific differences in gait patterns of healthy older adults: results from the Baltimore Longitudinal Study of Aging. , 2011, Journal of biomechanics.

[20]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[21]  C. Davatzikos,et al.  Addressing Confounding in Predictive Models with an Application to Neuroimaging , 2016, The international journal of biostatistics.

[22]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[23]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[24]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .