Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis

Since the beginning of the 21st century, the size, breadth, and granularity of data in biology and medicine has grown rapidly. In the example of neuroscience, studies with thousands of subjects are becoming more common, which provide extensive phenotyping on the behavioral, neural, and genomic level with hundreds of variables. The complexity of such big data repositories offer new opportunities and pose new challenges to investigate brain, cognition, and disease. Canonical correlation analysis (CCA) is a prototypical family of methods for wrestling with and harvesting insight from such rich datasets. This doubly-multivariate tool can simultaneously consider two variable sets from different modalities to uncover essential hidden associations. Our primer discusses the rationale, promises, and pitfalls of CCA in biomedicine.

[1]  Trevor Hastie,et al.  Computer Age Statistical Inference by Bradley Efron , 2016 .

[2]  Elizabeth Jefferies,et al.  Situating the default-mode network along a principal gradient of macroscale cortical organization , 2016, Proceedings of the National Academy of Sciences.

[3]  Yukiyasu Kamitani,et al.  Estimating image bases for visual image reconstruction from human brain activity , 2009, NIPS.

[4]  John Shawe-Taylor,et al.  A multiple hold-out framework for Sparse Partial Least Squares , 2016, Journal of Neuroscience Methods.

[5]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[6]  G. Varoquaux,et al.  Subspecialization within default mode nodes characterized in 10,000 UK Biobank participants , 2018, Proceedings of the National Academy of Sciences.

[7]  Michael I. Jordan On Gradient-Based Optimization: Accelerated, Distributed, Asynchronous and Stochastic , 2017, SIGMETRICS.

[8]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[9]  J. Pearl,et al.  The Book of Why: The New Science of Cause and Effect , 2018 .

[10]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[11]  Jessica A. Turner,et al.  Sharing the wealth: Neuroimaging data repositories , 2016, NeuroImage.

[12]  Essa Yacoub,et al.  The WU-Minn Human Connectome Project: An overview , 2013, NeuroImage.

[13]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[14]  D. Schacter,et al.  The Brain's Default Network , 2008, Annals of the New York Academy of Sciences.

[15]  Murray R. Barrick,et al.  THE BIG FIVE PERSONALITY DIMENSIONS AND JOB PERFORMANCE: A META-ANALYSIS , 1991 .

[16]  P. Matthews,et al.  Multimodal population brain imaging in the UK Biobank prospective epidemiological study , 2016, Nature Neuroscience.

[17]  Elizabeth Jefferies,et al.  How do we decide what to do? Resting-state connectivity patterns and components of self-generated thought linked to the development of more concrete personal goals , 2016, Experimental Brain Research.

[18]  Bradley Efron,et al.  The Future of Indirect Evidence. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[19]  Samuel Kaski,et al.  Bayesian Canonical correlation analysis , 2013, J. Mach. Learn. Res..

[20]  William D. Marslen-Wilson,et al.  The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing , 2014, BMC Neurology.

[21]  Thomas E. Nichols,et al.  A positive-negative mode of population covariation links brain connectivity, demographics and behavior , 2015, Nature Neuroscience.

[22]  P. Elliott,et al.  UK Biobank: Current status and what it means for epidemiology , 2012 .

[23]  Jonathan Smallwood,et al.  Journal of Experimental Psychology : General The Role of Mind-Wandering in Measurements of General Aptitude , 2012 .

[24]  N. Filippini,et al.  Group comparison of resting-state FMRI data using multi-subject ICA and dual regression , 2009, NeuroImage.

[25]  M. Chun,et al.  Functional connectome fingerprinting: Identifying individuals based on patterns of brain connectivity , 2015, Nature Neuroscience.

[26]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[27]  Thomas E. Nichols,et al.  Statistical Challenges in “Big Data” Human Neuroimaging , 2018, Neuron.

[28]  M. Kane,et al.  Working Memory Capacity, Mind Wandering, and Creative Cognition: An Individual-Differences Investigation into the Benefits of Controlled Versus Spontaneous Thought. , 2016, Psychology of aesthetics, creativity, and the arts.

[29]  G. Varoquaux,et al.  Connectivity‐based parcellation: Critique and implications , 2015, Human brain mapping.

[30]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[31]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[32]  Ragini Verma,et al.  The impact of in-scanner head motion on structural connectivity derived from diffusion MRI , 2018, NeuroImage.

[33]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[34]  Bruce Thompson,et al.  The Case for Using the General Linear Model as a Unifying Conceptual Framework for Teaching Statistics and Psychometric Theory , 2015 .

[35]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[36]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[37]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[38]  Christos Davatzikos,et al.  Linked dimensions of psychopathology and connectivity in functional brain networks , 2017 .

[39]  Cam-CAN Group,et al.  The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: Structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample , 2017, NeuroImage.

[40]  R. Nathan Spreng,et al.  The Common Neural Basis of Autobiographical Memory, Prospection, Navigation, Theory of Mind, and the Default Mode: A Quantitative Meta-analysis , 2009, Journal of Cognitive Neuroscience.

[41]  Danilo Bzdok,et al.  Classical Statistics and Statistical Learning in Imaging Neuroscience , 2016, Front. Neurosci..

[42]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[43]  T. Insel,et al.  Brain disorders? Precisely , 2015, Science.

[44]  C. Giraud Introduction to High-Dimensional Statistics , 2014 .

[45]  Christos Davatzikos,et al.  Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity , 2017, NeuroImage.

[46]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[47]  Vince D. Calhoun,et al.  A CCA+ICA based model for multi-task brain imaging data fusion and its application to schizophrenia , 2010, NeuroImage.

[48]  Hao-Ting Wang,et al.  Dimensions of Experience: Exploring the Heterogeneity of the Wandering Mind , 2018, Psychological science.

[49]  Margaret D. King,et al.  The NKI-Rockland Sample: A Model for Accelerating the Pace of Discovery Science in Psychiatry , 2012, Front. Neurosci..

[50]  Timothy E. Ham,et al.  Extrinsic and Intrinsic Brain Network Connectivity Maintains Cognition across the Lifespan Despite Accelerated Decay of Regional Brain Activation , 2016, The Journal of Neuroscience.

[51]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[52]  Hao-Ting Wang,et al.  Varieties of semantic cognition revealed through simultaneous decomposition of intrinsic brain connectivity and behaviour , 2017, NeuroImage.

[53]  B. Biswal,et al.  Functional connectivity in the motor cortex of resting human brain using echo‐planar mri , 1995, Magnetic resonance in medicine.

[54]  J. Smallwood,et al.  Inspired by Distraction , 2012, Psychological science.

[55]  J. V. Haxby,et al.  Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares , 1996, NeuroImage.

[56]  Richard McElreath Statistical rethinking: A bayesian course with R examples , 2015 .

[57]  K. Vogeley,et al.  Parsing the neural correlates of moral cognition: ALE meta-analysis on morality, theory of mind, and empathy , 2012, Brain Structure and Function.

[58]  Karl J. Friston,et al.  Bayesian decoding of brain images , 2008, NeuroImage.

[59]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[60]  Danielle S Bassett,et al.  Understanding the Emergence of Neuropsychiatric Disorders With Network Neuroscience. , 2018, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[61]  Christos Davatzikos,et al.  Neuroimaging of the Philadelphia Neurodevelopmental Cohort , 2014, NeuroImage.

[62]  T. Insel,et al.  Toward the future of psychiatric diagnosis: the seven pillars of RDoC , 2013, BMC Medicine.

[63]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[64]  J. Smallwood,et al.  The science of mind wandering: empirically navigating the stream of consciousness. , 2015, Annual review of psychology.

[65]  J. Smallwood,et al.  The restless mind. , 2006, Psychological bulletin.

[66]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models: Single-level regression , 2006 .

[67]  Hao-Ting Wang,et al.  Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest , 2018, NeuroImage.

[68]  Matej Oresic,et al.  Two-way analysis of high-dimensional collinear data , 2009, Data Mining and Knowledge Discovery.

[69]  Koen V. Haak,et al.  Functional corticostriatal connection topographies predict goal directed behaviour in humans , 2017 .

[70]  M. Kane,et al.  Conducting the train of thought: working memory capacity, goal neglect, and mind wandering in an executive-control task. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[71]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[72]  R. Buckner,et al.  Functional-Anatomic Fractionation of the Brain's Default Network , 2010, Neuron.

[73]  B. T. Thomas Yeo,et al.  Inference in the age of big data: Future perspectives on neuroscience , 2017, NeuroImage.

[74]  Krzysztof J. Gorgolewski,et al.  Making big data open: data sharing in neuroimaging , 2014, Nature Neuroscience.

[75]  T. R. Knapp Canonical correlation analysis: A general parametric significance-testing system. , 1978 .

[76]  Jonathon Shlens,et al.  A Tutorial on Independent Component Analysis , 2014, ArXiv.

[77]  John Shawe-Taylor,et al.  Sparse PLS hyper-parameters optimisation for investigating brain-behaviour relationships , 2018, 2018 International Workshop on Pattern Recognition in Neuroimaging (PRNI).

[78]  Jake K. Byrnes,et al.  Reconstructing the Population Genetic History of the Caribbean , 2013, PLoS genetics.

[79]  Olaf Sporns,et al.  Network-Level Structure-Function Relationships in Human Neocortex , 2016, Cerebral cortex.

[80]  D. Schacter,et al.  Mind-Wandering as a Natural Kind: A Family-Resemblances View , 2018, Trends in Cognitive Sciences.

[81]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[82]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.