Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships

Background In 2009, the National Institute of Mental Health launched the Research Domain Criteria, an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine different levels of measures (e.g., brain imaging and behavior). Statistical methods that can integrate such multimodal data, however, are often vulnerable to overfitting, poor generalization, and difficulties in interpreting the results. Methods We propose an innovative machine learning framework combining multiple holdouts and a stability criterion with regularized multivariate techniques, such as sparse partial least squares and kernel canonical correlation analysis, for identifying hidden dimensions of cross-modality relationships. To illustrate the approach, we investigated structural brain–behavior associations in an extensively phenotyped developmental sample of 345 participants (312 healthy and 33 with clinical depression). The brain data consisted of whole-brain voxel-based gray matter volumes, and the behavioral data included item-level self-report questionnaires and IQ and demographic measures. Results Both sparse partial least squares and kernel canonical correlation analysis captured two hidden dimensions of brain–behavior relationships: one related to age and drinking and the other one related to depression. The applied machine learning framework indicates that these results are stable and generalize well to new data. Indeed, the identified brain–behavior associations are in agreement with previous findings in the literature concerning age, alcohol use, and depression-related changes in brain volume. Conclusions Multivariate techniques (such as sparse partial least squares and kernel canonical correlation analysis) embedded in our novel framework are promising tools to link behavior and/or symptoms to neurobiology and thus have great potential to contribute to a biologically grounded definition of psychiatric disorders.

[1]  Bruce N Cuthbert,et al.  The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology , 2014, World psychiatry : official journal of the World Psychiatric Association.

[2]  Lisa A. Weissfeld,et al.  A variant of sparse partial least squares for variable selection and data exploration , 2014, Front. Neuroinform..

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[5]  Janaina Mourão Miranda,et al.  Multivariate Effect Ranking via Adaptive Sparse PLS , 2015, 2015 International Workshop on Pattern Recognition in NeuroImaging.

[6]  Michael Moutoussis,et al.  Characterising the latent structure and organisation of self-reported thoughts, feelings and behaviours in adolescents and young adults , 2017, PloS one.

[7]  Karl J. Friston,et al.  Diffeomorphic registration using geodesic shooting and Gauss–Newton optimisation , 2011, NeuroImage.

[8]  F. Bookstein,et al.  A new statistical method for testing hypotheses of neuropsychological/MRI relationships in schizophrenia: partial least squares analysis , 2002, Schizophrenia Research.

[9]  Michael Moutoussis,et al.  Cohort Profile Cohort profile : The NSPN 2400 Cohort : a developmental sample supporting the Wellcome Trust NeuroScience in Psychiatry Network , 2017 .

[10]  Paul M. Thompson,et al.  Smaller Hippocampal Volume in Posttraumatic Stress Disorder: A Multisite ENIGMA-PGC Study: Subcortical Volumetry Results From Posttraumatic Stress Disorder Consortia , 2018, Biological Psychiatry.

[11]  C. Neumann,et al.  The Antisocial Process Screening Device , 2003, Assessment.

[12]  Peter B. Jones,et al.  Gene transcription profiles associated with inter-modular hubs and connection distance in human functional magnetic resonance imaging networks , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  Anthony Randal McIntosh,et al.  Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review , 2011, NeuroImage.

[14]  B. Axelrod Validity of the Wechsler Abbreviated Scale of Intelligence and Other Very Short Forms of Estimating Intellectual Functioning , 2002, Assessment.

[15]  Hao-Ting Wang,et al.  Dimensions of Experience: Exploring the Heterogeneity of the Wandering Mind , 2018, Psychological science.

[16]  Brian B. Avants,et al.  Dementia induces correlated reductions in white matter integrity and cortical thickness: A multivariate neuroimaging study with sparse canonical correlation analysis , 2010, NeuroImage.

[17]  I. Goodyer,et al.  Development of a short leyton obsessional inventory for children and adolescents. , 2002, Journal of the American Academy of Child and Adolescent Psychiatry.

[18]  Peter B. Jones,et al.  Compulsivity and impulsivity traits linked to attenuated developmental fronto-striatal myelination trajectories , 2019, Nature Neuroscience.

[19]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[20]  Jacob A. Wegelin,et al.  A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case , 2000 .

[21]  I. Deary,et al.  Subcortical volume and white matter integrity abnormalities in major depressive disorder: findings from UK Biobank imaging data , 2016, bioRxiv.

[22]  Michael J. Owen,et al.  The Kraepelinian dichotomy – going, going... but still not gone , 2010, British Journal of Psychiatry.

[23]  Edward T. Bullmore,et al.  Synaptic and transcriptionally downregulated genes are associated with cortical thickness differences in autism , 2017, bioRxiv.

[24]  Christian Gaser,et al.  Partial least squares correlation of multivariate cognitive abilities and local brain structure in children and adolescents , 2013, NeuroImage.

[25]  Andrei Irimia,et al.  Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects , 2015, Pain.

[26]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[27]  R. Kessler,et al.  Short screening scales to monitor population prevalences and trends in non-specific psychological distress , 2002, Psychological Medicine.

[28]  Won Hee Lee,et al.  Multivariate Associations Among Behavioral, Clinical, and Multimodal Imaging Phenotypes in Patients With Psychosis , 2018, JAMA psychiatry.

[29]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[30]  Richard F. Betzel,et al.  Linked dimensions of psychopathology and connectivity in functional brain networks , 2017, bioRxiv.

[31]  D. Barch,et al.  Hippocampal volume and depression among young children , 2019, Psychiatry Research: Neuroimaging.

[32]  Won Hee Lee,et al.  Behavioral and Health Correlates of Resting-State Metastability in the Human Connectome Project , 2018, Brain Topography.

[33]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[34]  Andre F. Marquand,et al.  Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging , 2015, Front. Neurosci..

[35]  John Shawe-Taylor,et al.  SCoRS—A Method Based on Stability for Feature Selection and Mapping in Neuroimaging , 2014, IEEE Transactions on Medical Imaging.

[36]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[37]  C. Reynolds,et al.  What i think and feel: A revised measure of children's manifest anxiety , 1978, Journal of abnormal child psychology.

[38]  T. Insel,et al.  Brain disorders? Precisely , 2015, Science.

[39]  O. Abe,et al.  Common and distinct patterns of grey-matter volume alteration in major depression and bipolar disorder: evidence from voxel-based meta-analysis , 2016, Molecular Psychiatry.

[40]  John Shawe-Taylor,et al.  A multiple hold-out framework for Sparse Partial Least Squares , 2016, Journal of Neuroscience Methods.

[41]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[42]  J. M. Digman PERSONALITY STRUCTURE: EMERGENCE OF THE FIVE-FACTOR MODEL , 1990 .

[43]  Francis R. Bach,et al.  Model-Consistent Sparse Estimation through the Bootstrap , 2009, ArXiv.

[44]  L. Hiller,et al.  The Warwick-Edinburgh Mental Well-being Scale (WEMWBS): development and UK validation , 2007, Health and quality of life outcomes.

[45]  Yong He,et al.  BrainNet Viewer: A Network Visualization Tool for Human Brain Connectomics , 2013, PloS one.

[46]  Luca Baldassarre,et al.  Sparsity Is Better with Stability: Combining Accuracy and Stability for Model Selection in Brain Decoding , 2017, Front. Neurosci..

[47]  Nikolaus Weiskopf,et al.  Quantitative multi-parameter mapping of R1, PD*, MT, and R2* at 3T: a multi-center validation , 2013, Front. Neurosci..

[48]  M. Paulus,et al.  Neural Predictors of Initiating Alcohol Use During Adolescence. , 2017, The American journal of psychiatry.

[49]  B.J. Lopresti,et al.  Quantitative and statistical analyses of PET imaging studies of amyloid deposition in humans , 2004, IEEE Symposium Conference Record Nuclear Science 2004..

[50]  Janet B W Williams,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[51]  T. Insel,et al.  Wesleyan University From the SelectedWorks of Charles A . Sanislow , Ph . D . 2010 Research Domain Criteria ( RDoC ) : Toward a New Classification Framework for Research on Mental Disorders , 2018 .

[52]  Alan C. Evans,et al.  Brain development during childhood and adolescence: a longitudinal MRI study , 1999, Nature Neuroscience.

[53]  B. Peterson,et al.  Normal Development of Brain Circuits , 2010, Neuropsychopharmacology.

[54]  J. V. Haxby,et al.  Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares , 1996, NeuroImage.

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[56]  Philippe Besse,et al.  Sparse canonical methods for biological data integration: application to a cross-platform study , 2009, BMC Bioinformatics.

[57]  Tijl De Bie,et al.  Eigenproblems in Pattern Recognition , 2005 .

[58]  E. Leibenluft,et al.  Cortical Thickness and Subcortical Gray Matter Volume in Pediatric Anxiety Disorders , 2017, Neuropsychopharmacology.

[59]  Vince D. Calhoun,et al.  Correspondence between fMRI and SNP data by group sparse canonical correlation analysis , 2014, Medical Image Anal..

[60]  M. Lovejoy,et al.  Initial reliability and validity of the childhood trauma interview: a new multidimensional measure of childhood interpersonal trauma. , 1995, The American journal of psychiatry.

[61]  Michael Moutoussis,et al.  Brain-behaviour modes of covariation in healthy and clinically depressed young people , 2018, bioRxiv.

[62]  H. Abdi Partial least squares regression and projection on latent structure regression (PLS Regression) , 2010 .

[63]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[64]  D. Dima,et al.  An integrated brain-behavior model for working memory , 2017, Molecular Psychiatry.

[65]  Ganna Leonenko,et al.  A data‐driven investigation of relationships between bipolar psychotic symptoms and schizophrenia genome‐wide significant genetic loci , 2018, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[66]  E. Walker,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[67]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[68]  Aeilko H. Zwinderman,et al.  Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks , 2009, BMC Bioinformatics.

[69]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[70]  S. Blakemore,et al.  Development of the Cerebral Cortex across Adolescence: A Multisample Study of Inter-Related Longitudinal Changes in Cortical Volume, Surface Area, and Thickness , 2017, The Journal of Neuroscience.

[71]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[72]  D. J. Lee Society and the Adolescent Self-Image , 1969 .

[73]  Anthony Randal McIntosh,et al.  Partial least squares analysis of neuroimaging data: applications and advances , 2004, NeuroImage.

[74]  Lars T. Westlye,et al.  Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of MRI and genetic data , 2015, NeuroImage.

[75]  Mark W. Woolrich,et al.  Advances in functional and structural MR image analysis and implementation as FSL , 2004, NeuroImage.

[76]  Hao-Ting Wang,et al.  Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest , 2018, NeuroImage.

[77]  D. Streiner,et al.  A comparison of cluster and factor analytic techniques for identifying symptom-based dimensions of obsessive-compulsive disorder , 2019, Psychiatry Research.

[78]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[79]  Brian B. Avants,et al.  Genetic and neuroanatomic associations in sporadic frontotemporal lobar degeneration , 2014, Neurobiology of Aging.

[80]  Edmund T. Rolls,et al.  Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas , 2015, NeuroImage.

[81]  I. Waldman,et al.  Psychometric Characteristics of a Measure of Emotional Dispositions Developed to Test a Developmental Propensity Model of Conduct Disorder , 2008, Journal of clinical child and adolescent psychology : the official journal for the Society of Clinical Child and Adolescent Psychology, American Psychological Association, Division 53.

[82]  Tianzi Jiang,et al.  Sparse canonical correlation analysis reveals correlated patterns of gray matter loss and white matter impairment in alzheimer's disease , 2015, 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).

[83]  Michael Eickenberg,et al.  Machine learning for neuroimaging with scikit-learn , 2014, Front. Neuroinform..

[84]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[85]  Brian B. Avants,et al.  Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population , 2014, NeuroImage.

[86]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..

[87]  Richard S. Frackowiak,et al.  Improved segmentation of deep brain grey matter structures using magnetization transfer (MT) parameter maps , 2009, NeuroImage.

[88]  E. Costello,et al.  Scales to assess child and adolescent depression: checklists, screens, and nets. , 1988, Journal of the American Academy of Child and Adolescent Psychiatry.

[89]  E. Bullmore,et al.  Metacognitive impairments extend perceptual decision making weaknesses in compulsivity , 2017, Scientific Reports.

[90]  Nikolaus Weiskopf,et al.  hMRI – A toolbox for quantitative MRI in neuroscience and clinical research , 2019, NeuroImage.

[91]  Chi-Hua Chen,et al.  Neurocognitive endophenotypes of obsessive-compulsive disorder. , 2007, Brain : a journal of neurology.

[92]  Thomas E. Nichols,et al.  A positive-negative mode of population covariation links brain connectivity, demographics and behavior , 2015, Nature Neuroscience.

[93]  M. Rietschel,et al.  Neuropsychosocial profiles of current and future adolescent alcohol misusers , 2014, Nature.

[94]  T. Insel,et al.  Toward the future of psychiatric diagnosis: the seven pillars of RDoC , 2013, BMC Medicine.

[95]  Peter B. Jones,et al.  373. Adolescence is Associated with Genomically Patterned Consolidation of the Hubs of the Human Brain Connectome , 2016, Biological Psychiatry.

[96]  Antonio Moreno,et al.  Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares , 2012, NeuroImage.

[97]  Thomas E. Nichols,et al.  Subcortical brain volume differences in participants with attention deficit hyperactivity disorder in children and adults: a cross-sectional mega-analysis. , 2017, The lancet. Psychiatry.

[98]  P. Harris,et al.  Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support , 2009, J. Biomed. Informatics.

[99]  Pienie Zwitserlood,et al.  Association of Brain Cortical Changes With Relapse in Patients With Major Depressive Disorder , 2018, JAMA psychiatry.

[100]  John Shawe-Taylor,et al.  Leveraging Clinical Data to Enhance Localization of Brain Atrophy , 2014, MLINI@NIPS.

[101]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[102]  J. Patton,et al.  Factor structure of the Barratt impulsiveness scale. , 1995, Journal of clinical psychology.

[103]  S. Hyman Can neuroscience be integrated into the DSM-V? , 2007, Nature Reviews Neuroscience.

[104]  Juho Rousu,et al.  A Tutorial on Canonical Correlation Methods , 2017, ACM Comput. Surv..

[105]  Lachlan T. Strike,et al.  Subcortical brain alterations in major depressive disorder: findings from the ENIGMA Major Depressive Disorder working group , 2015, Molecular Psychiatry.

[106]  Philippe Besse,et al.  Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems , 2011, BMC Bioinformatics.

[107]  A. Zwinderman,et al.  Statistical Applications in Genetics and Molecular Biology Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis , 2011 .

[108]  N. Carragher,et al.  The structure of adolescent psychopathology: a symptom-level analysis , 2015, Psychological Medicine.