Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists

The 21st century marks the emergence of "big data" with a rapid increase in the availability of data sets with multiple measurements. In neuroscience, brain-imaging datasets are more commonly accompanied by dozens or even hundreds of phenotypic subject descriptors on the behavioral, neural, and genomic level. The complexity of such "big data" repositories offer new opportunities and pose new challenges for systems neuroscience. Canonical correlation analysis (CCA) is a prototypical family of methods that is useful in identifying the links between variable sets from different modalities. Importantly, CCA is well suited to describing relationships across multiple sets of data and so is well suited to the analysis of big neuroscience datasets. Our primer discusses the rationale, promises, and pitfalls of CCA.

[1]  Xiaowei Zhuang,et al.  Multivariate group-level analysis for task fMRI data with canonical correlation analysis , 2019, NeuroImage.

[2]  J. Stevens Applied Multivariate Statistics for the Social Sciences , 1986 .

[3]  M. Healy A rotation method for computing canonical correlations , 1957 .

[4]  R. Gonzalez Applied Multivariate Statistics for the Social Sciences , 2003 .

[5]  Juho Rousu,et al.  A Tutorial on Canonical Correlation Methods , 2017, ACM Comput. Surv..

[6]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[7]  P. Matthews,et al.  Multimodal population brain imaging in the UK Biobank prospective epidemiological study , 2016, Nature Neuroscience.

[8]  Xiaowei Zhuang,et al.  A family of locally constrained CCA models for detecting activation patterns in fMRI , 2017, NeuroImage.

[9]  Olaf Sporns,et al.  Network-Level Structure-Function Relationships in Human Neocortex , 2016, Cerebral cortex.

[10]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[11]  Wenxing Hu,et al.  Distance canonical correlation analysis with application to an imaging-genetic study , 2019, Journal of medical imaging.

[12]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[13]  Hao-Ting Wang,et al.  Varieties of semantic cognition revealed through simultaneous decomposition of intrinsic brain connectivity and behaviour , 2017, NeuroImage.

[14]  B. Biswal,et al.  Functional connectivity in the motor cortex of resting human brain using echo‐planar mri , 1995, Magnetic resonance in medicine.

[15]  D. Schacter,et al.  Mind-Wandering as a Natural Kind: A Family-Resemblances View , 2018, Trends in Cognitive Sciences.

[16]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[17]  Xiaowei Zhuang,et al.  3D spatially-adaptive canonical correlation analysis: Local and global methods , 2018, NeuroImage.

[18]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[19]  J. V. Haxby,et al.  Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares , 1996, NeuroImage.

[20]  Yong-Mahn Han,et al.  REST is a key regulator in brain‐specific homeobox gene expression during neuronal differentiation , 2007, Journal of neurochemistry.

[21]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[22]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[23]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[24]  Masanobu Yamada,et al.  Thyrotropin-releasing hormone (TRH) in the cerebellum , 2008, The Cerebellum.

[25]  Cam-CAN Group,et al.  The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: Structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample , 2017, NeuroImage.

[26]  B. T. Thomas Yeo,et al.  Inference in the age of big data: Future perspectives on neuroscience , 2017, NeuroImage.

[27]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[28]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[29]  R. Nathan Spreng,et al.  The Common Neural Basis of Autobiographical Memory, Prospection, Navigation, Theory of Mind, and the Default Mode: A Quantitative Meta-analysis , 2009, Journal of Cognitive Neuroscience.

[30]  Danilo Bzdok,et al.  Classical Statistics and Statistical Learning in Imaging Neuroscience , 2016, Front. Neurosci..

[31]  John Shawe-Taylor,et al.  Sparse PLS hyper-parameters optimisation for investigating brain-behaviour relationships , 2018, 2018 International Workshop on Pattern Recognition in Neuroimaging (PRNI).

[32]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[33]  Danilo Bzdok,et al.  Analysing brain networks in population neuroscience: a case for the Bayesian philosophy , 2020, Philosophical Transactions of the Royal Society B.

[34]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[35]  A. Meyer-Lindenberg,et al.  Machine Learning for Precision Psychiatry: Opportunities and Challenges. , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[36]  Robert Tibshirani,et al.  Collaborative regression. , 2014, Biostatistics.

[37]  Jake K. Byrnes,et al.  Reconstructing the Population Genetic History of the Caribbean , 2013, PLoS genetics.

[38]  O. Kvalheim,et al.  Multivariate data analysis in pharmaceutics: a tutorial review. , 2011, International journal of pharmaceutics.

[39]  Hans Knutsson,et al.  Adaptive analysis of fMRI data , 2003, NeuroImage.

[40]  M. Chun,et al.  Functional connectome fingerprinting: Identifying individuals based on patterns of brain connectivity , 2015, Nature Neuroscience.

[41]  Xiaowei Zhuang,et al.  Performing Sparse Regularization and Dimension Reduction Simultaneously in Multimodal Data Fusion , 2019, Front. Neurosci..

[42]  Thomas E. Nichols,et al.  Statistical Challenges in “Big Data” Human Neuroimaging , 2018, Neuron.

[43]  Jieping Ye,et al.  Finite Domain Constraint Solver Learning , 2009, IJCAI.

[44]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[45]  Vince D. Calhoun,et al.  A review of multivariate methods for multimodal fusion of brain imaging data , 2012, Journal of Neuroscience Methods.

[46]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[47]  Gui Xue,et al.  Global Neural Pattern Similarity as a Common Basis for Categorization and Recognition Memory , 2014, The Journal of Neuroscience.

[48]  Samuel Kaski,et al.  Bayesian Canonical correlation analysis , 2013, J. Mach. Learn. Res..

[49]  Vince D. Calhoun,et al.  A review of multivariate analyses in imaging genetics , 2014, Front. Neuroinform..

[50]  Ralf Herwig,et al.  The ConsensusPathDB interaction database: 2013 update , 2012, Nucleic Acids Res..

[51]  William D. Marslen-Wilson,et al.  The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing , 2014, BMC Neurology.

[52]  Essa Yacoub,et al.  The WU-Minn Human Connectome Project: An overview , 2013, NeuroImage.

[53]  H. Knutsson,et al.  Detection of neural activity in functional MRI using canonical correlation analysis , 2001, Magnetic resonance in medicine.

[54]  Walter Luyten,et al.  AMIGO2 mRNA expression in hippocampal CA2 and CA3a , 2012, Brain Structure and Function.

[55]  Krzysztof J. Gorgolewski,et al.  Making big data open: data sharing in neuroimaging , 2014, Nature Neuroscience.

[56]  T. R. Knapp Canonical correlation analysis: A general parametric significance-testing system. , 1978 .

[57]  James P. Stevens,et al.  Applied Multivariate Statistics for the Social Sciences : Analyses with SAS and IBM’s SPSS, Sixth Edition , 2015 .

[58]  Koen V. Haak,et al.  Functional corticostriatal connection topographies predict goal directed behaviour in humans , 2017, Nature Human Behaviour.

[59]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[60]  T. Insel,et al.  Toward the future of psychiatric diagnosis: the seven pillars of RDoC , 2013, BMC Medicine.

[61]  Thomas E. Nichols,et al.  A positive-negative mode of population covariation links brain connectivity, demographics and behavior , 2015, Nature Neuroscience.

[62]  P. Elliott,et al.  UK Biobank: Current status and what it means for epidemiology , 2012 .

[63]  N. Filippini,et al.  Group comparison of resting-state FMRI data using multi-subject ICA and dual regression , 2009, NeuroImage.

[64]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[65]  Thomas E. Nichols,et al.  Towards algorithmic analytics for large-scale datasets , 2019, Nature Machine Intelligence.

[66]  Jonathon Shlens,et al.  A Tutorial on Independent Component Analysis , 2014, ArXiv.

[67]  Vince D. Calhoun,et al.  Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI , 2010, NeuroImage.

[68]  N. Shibusawa,et al.  [Thyrotropin-releasing hormone (TRH)]. , 2010, Nihon rinsho. Japanese journal of clinical medicine.

[69]  John P. A. Ioannidis,et al.  Exploration, Inference, and Prediction in Neuroscience and Biomedicine , 2019, Trends in Neurosciences.

[70]  C. Giraud Introduction to High-Dimensional Statistics , 2014 .

[71]  Bradley Efron,et al.  The Future of Indirect Evidence. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[72]  Danilo Bzdok,et al.  Points of Significance: Statistics versus machine learning , 2018, Nature Methods.

[73]  J. Pillai Functional Connectivity. , 2017, Neuroimaging clinics of North America.

[74]  Hao He,et al.  Combination of FMRI-SMRI-EEG data improves discrimination of schizophrenia patients by ensemble feature selection , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[75]  Vince D Calhoun,et al.  Four‐way multimodal fusion of 7 T imaging data using an mCCA+jICA model in first‐episode schizophrenia , 2018, Human brain mapping.

[76]  Vince D. Calhoun,et al.  Adaptive Sparse Multiple Canonical Correlation Analysis With Application to Imaging (Epi)Genomics Study of Schizophrenia , 2017, IEEE Transactions on Biomedical Engineering.

[77]  Hans Knutsson,et al.  Detection of Neural Activity in fMRI Using Maximum Correlation Modeling , 2002, NeuroImage.

[78]  Janaina Mourão Miranda,et al.  Unsupervised analysis of fMRI data using kernel canonical correlation , 2007, NeuroImage.

[79]  G. Varoquaux,et al.  Connectivity‐based parcellation: Critique and implications , 2015, Human brain mapping.

[80]  Renate Thienel,et al.  Functional brain imaging of symptoms and cognition in schizophrenia. , 2005, Progress in brain research.

[81]  Yukiyasu Kamitani,et al.  Estimating image bases for visual image reconstruction from human brain activity , 2009, NIPS.

[82]  John Shawe-Taylor,et al.  A multiple hold-out framework for Sparse Partial Least Squares , 2016, Journal of Neuroscience Methods.

[83]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[84]  Danilo Bzdok,et al.  Formal Models of the Network Co-occurrence Underlying Mental Operations , 2016, PLoS Comput. Biol..

[85]  G. Varoquaux,et al.  Subspecialization within default mode nodes characterized in 10,000 UK Biobank participants , 2018, Proceedings of the National Academy of Sciences.

[86]  V. Calhoun,et al.  Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness. , 2016, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[87]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[88]  Jin Liu,et al.  Canonical variate regression. , 2016, Biostatistics.

[89]  Hans Knutsson,et al.  Detection and detrending in fMRI data analysis , 2004, NeuroImage.

[90]  M. Chun,et al.  Functional connectome fingerprinting: Identifying individuals based on patterns of brain connectivity , 2015, Nature Neuroscience.

[91]  Murray R. Barrick,et al.  THE BIG FIVE PERSONALITY DIMENSIONS AND JOB PERFORMANCE: A META-ANALYSIS , 1991 .

[92]  Vince D. Calhoun,et al.  Canonical Correlation Analysis for Data Fusion and Group Inferences , 2010, IEEE Signal Processing Magazine.

[93]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[94]  T. Insel,et al.  Brain disorders? Precisely , 2015, Science.

[95]  Karl J. Friston,et al.  Bayesian decoding of brain images , 2008, NeuroImage.

[96]  Christos Davatzikos,et al.  Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity , 2017, NeuroImage.

[97]  Hao-Ting Wang,et al.  Dimensions of Experience: Exploring the Heterogeneity of the Wandering Mind , 2018, Psychological science.

[98]  Margaret D. King,et al.  The NKI-Rockland Sample: A Model for Accelerating the Pace of Discovery Science in Psychiatry , 2012, Front. Neurosci..

[99]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[100]  Timothy E. Ham,et al.  Extrinsic and Intrinsic Brain Network Connectivity Maintains Cognition across the Lifespan Despite Accelerated Decay of Regional Brain Activation , 2016, The Journal of Neuroscience.

[101]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[102]  Tim Curran,et al.  Optimizing the performance of local canonical correlation analysis in fMRI using spatial constraints , 2012, Human brain mapping.

[103]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[104]  Qiuping Xu Canonical correlation Analysis , 2014 .

[105]  Ken Tough Optimizing the Performance , 1999 .

[106]  Antonio Moreno,et al.  Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares , 2012, NeuroImage.

[107]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[108]  Ragini Verma,et al.  The impact of in-scanner head motion on structural connectivity derived from diffusion MRI , 2018, NeuroImage.

[109]  Bruce Thompson,et al.  The Case for Using the General Linear Model as a Unifying Conceptual Framework for Teaching Statistics and Psychometric Theory , 2015 .

[110]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[111]  Christian Robert,et al.  Statistical Rethinking , 2017 .

[112]  Lars T. Westlye,et al.  Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of MRI and genetic data , 2015, NeuroImage.

[113]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[114]  Hao-Ting Wang,et al.  Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest , 2018, NeuroImage.

[115]  Matej Oresic,et al.  Two-way analysis of high-dimensional collinear data , 2009, Data Mining and Knowledge Discovery.

[116]  Mark Dredze,et al.  Machine learning:Trends, perspectives, and prospects , 2015 .

[117]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[118]  ジェイムズ ダグラス アームストロング,et al.  Detection of neural activity , 1999 .

[119]  Vince D. Calhoun,et al.  A CCA+ICA based model for multi-task brain imaging data fusion and its application to schizophrenia , 2010, NeuroImage.

[120]  Benjamin W. Mooneyham,et al.  The costs and benefits of mind-wandering: a review. , 2013, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[121]  K. Vogeley,et al.  Parsing the neural correlates of moral cognition: ALE meta-analysis on morality, theory of mind, and empathy , 2012, Brain Structure and Function.

[122]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[123]  R. Buckner,et al.  Functional-Anatomic Fractionation of the Brain's Default Network , 2010, Neuron.

[124]  Richard F. Betzel,et al.  Linked dimensions of psychopathology and connectivity in functional brain networks , 2017, bioRxiv.

[125]  Lucas C Parra,et al.  Multi-set Canonical Correlation Analysis simply explained. , 2018, 1802.03759.