Predictive modelling using neuroimaging data in the presence of confounds

Abstract When training predictive models from neuroimaging data, we typically have available non‐imaging variables such as age and gender that affect the imaging data but which we may be uninterested in from a clinical perspective. Such variables are commonly referred to as ‘confounds’. In this work, we firstly give a working definition for confound in the context of training predictive models from samples of neuroimaging data. We define a confound as a variable which affects the imaging data and has an association with the target variable in the sample that differs from that in the population‐of‐interest, i.e., the population over which we intend to apply the estimated predictive model. The focus of this paper is the scenario in which the confound and target variable are independent in the population‐of‐interest, but the training sample is biased due to a sample association between the target and confound. We then discuss standard approaches for dealing with confounds in predictive modelling such as image adjustment and including the confound as a predictor, before deriving and motivating an Instance Weighting scheme that attempts to account for confounds by focusing model training so that it is optimal for the population‐of‐interest. We evaluate the standard approaches and Instance Weighting in two regression problems with neuroimaging data in which we train models in the presence of confounding, and predict samples that are representative of the population‐of‐interest. For comparison, these models are also evaluated when there is no confounding present. In the first experiment we predict the MMSE score using structural MRI from the ADNI database with gender as the confound, while in the second we predict age using structural MRI from the IXI database with acquisition site as the confound. Considered over both datasets we find that none of the methods for dealing with confounding gives more accurate predictions than a baseline model which ignores confounding, although including the confound as a predictor gives models that are less accurate than the baseline model. We do find, however, that different methods appear to focus their predictions on specific subsets of the population‐of‐interest, and that predictive accuracy is greater when there is no confounding present. We conclude with a discussion comparing the advantages and disadvantages of each approach, and the implications of our evaluation for building predictive models that can be used in clinical practice. HighlightsDefinition of confound given from the point of view of predictive modelling.Instance Weighting derived for dealing with confounding with continuous targets.None of the evaluated methods performs better than a model that ignores confounding.Different methods favourably predicted different strata of population‐of‐interest.Predictive accuracy over population‐of‐interest reduced in presence of confounding.

[1]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[2]  References , 1971 .

[3]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[4]  Stefan Klöppel,et al.  Correction of inter-scanner and within-subject variance in structural MRI based automated diagnosing , 2014, NeuroImage.

[5]  J. Dukart,et al.  Age Correction in Dementia – Matching to a Healthy Brain , 2011, PloS one.

[6]  John Ashburner,et al.  Kernel regression for fMRI pattern prediction , 2011, NeuroImage.

[7]  E. Stuart,et al.  Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies , 2015, Statistics in medicine.

[8]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[9]  Stefan Klöppel,et al.  Reduction of confounding effects with voxel-wise Gaussian process regression in structural MRI , 2014, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[10]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[11]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[12]  Christian Wachinger,et al.  Domain adaptation for Alzheimer's disease diagnostics , 2016, NeuroImage.

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  N. K. Focke,et al.  Multi-site voxel-based morphometry — Not quite there yet , 2011, NeuroImage.

[15]  Bill Kraft,et al.  Comparison of Strategies , 2012 .

[16]  Janaina Mourão Miranda,et al.  Quantitative prediction of subjective pain intensity from whole-brain fMRI data using Gaussian processes , 2010, NeuroImage.

[17]  M. Jorge Cardoso,et al.  Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment☆ , 2013, NeuroImage: Clinical.

[18]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[19]  Clifford R. Jack,et al.  Predicting Clinical Scores from Magnetic Resonance Scans in Alzheimer's Disease , 2010, NeuroImage.

[20]  Russell Greiner,et al.  ADHD-200 Global Competition: diagnosing ADHD using personal characteristic data can outperform resting state fMRI measurements , 2012, Front. Syst. Neurosci..

[21]  Naoto Hayashi,et al.  Effects of the use of multiple scanners and of scanner upgrade in longitudinal voxel‐based morphometry studies , 2013, Journal of magnetic resonance imaging : JMRI.

[22]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[23]  G. Imbens,et al.  The Propensity Score with Continuous Treatments , 2005 .

[24]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[25]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[26]  John Ashburner,et al.  Multivariate decoding of brain images using ordinal regression☆ , 2013, NeuroImage.

[27]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[28]  M. B. Nebel,et al.  Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging , 2012, Front. Syst. Neurosci..

[29]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[30]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[31]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[32]  C. Davatzikos,et al.  Addressing Confounding in Predictive Models with an Application to Neuroimaging , 2016, The international journal of biostatistics.

[33]  Janaina Mourão Miranda,et al.  A Comparison of Strategies for Incorporating Nuisance Variables into Predictive Neuroimaging Models , 2015, 2015 International Workshop on Pattern Recognition in NeuroImaging.