Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of MRI and genetic data

The standard analysis approach in neuroimaging genetics studies is the mass-univariate linear modeling (MULM) approach. From a statistical view, however, this approach is disadvantageous, as it is computationally intensive, cannot account for complex multivariate relationships, and has to be corrected for multiple testing. In contrast, multivariate methods offer the opportunity to include combined information from multiple variants to discover meaningful associations between genetic and brain imaging data. We assessed three multivariate techniques, partial least squares correlation (PLSC), sparse canonical correlation analysis (sparse CCA) and Bayesian inter-battery factor analysis (Bayesian IBFA), with respect to their ability to detect multivariate genotype-phenotype associations. Our goal was to systematically compare these three approaches with respect to their performance and to assess their suitability for high-dimensional and multi-collinearly dependent data as is the case in neuroimaging genetics studies. In a series of simulations using both linearly independent and multi-collinear data, we show that sparse CCA and PLSC are suitable even for very high-dimensional collinear imaging data sets. Among those two, the predictive power was higher for sparse CCA when voxel numbers were below 400 times sample size and candidate SNPs were considered. Accordingly, we recommend Sparse CCA for candidate phenotype, candidate SNP studies. When voxel numbers exceeded 500 times sample size, the predictive power was the highest for PLSC. Therefore, PLSC can be considered a promising technique for multivariate modeling of high-dimensional brain-SNP-associations. In contrast, Bayesian IBFA cannot be recommended, since additional post-processing steps were necessary to detect causal relations. To verify the applicability of sparse CCA and PLSC, we applied them to an experimental imaging genetics data set provided for us. Most importantly, application of both methods replicated the findings of this data set.

[1]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[2]  Anthony Randal McIntosh,et al.  Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review , 2011, NeuroImage.

[3]  Vince D. Calhoun,et al.  Group sparse canonical correlation analysis for genomic data integration , 2013, BMC Bioinformatics.

[4]  Eva Ceulemans,et al.  CHull: A generic convex-hull-based model selection method , 2012, Behavior Research Methods.

[5]  Natasa Kovacevic,et al.  Revisiting PLS Resampling: Comparing Significance Versus Reliability Across Range of Simulations , 2013 .

[6]  Jacob A. Wegelin,et al.  A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case , 2000 .

[7]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[8]  J. A. Boer,et al.  Amygdala activation and its functional connectivity during perception of emotional faces in social phobia and panic disorder. , 2013, Journal of psychiatric research.

[9]  Anthony Randal McIntosh,et al.  Partial least squares analysis of neuroimaging data: applications and advances , 2004, NeuroImage.

[10]  Paul M. Thompson,et al.  Imaging genetics via sparse canonical correlation analysis , 2013, 2013 IEEE 10th International Symposium on Biomedical Imaging.

[11]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[12]  David Tritchler,et al.  Genome-wide sparse canonical correlation of gene expression with genotypes , 2007, BMC proceedings.

[13]  J. V. Haxby,et al.  Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares , 1996, NeuroImage.

[14]  A. Zwinderman,et al.  Statistical Applications in Genetics and Molecular Biology Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis , 2011 .

[15]  Antonio Moreno,et al.  Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares , 2012, NeuroImage.

[16]  Richard D. Cramer BC(DEF) parameters. 2. An empirical structure-based scheme for the prediction of some physical properties , 1980 .

[17]  Jingyu Liu,et al.  Sparse canonical correlation analysis applied to fMRI and genetic data fusion , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[18]  D. Hardoon,et al.  Correlation-based multivariate analysis of genetic influence on brain volume , 2009, Neuroscience Letters.

[19]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[20]  J. Brunet,et al.  Phox2 genes - from patterning to connectivity. , 2002, Current opinion in genetics & development.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[23]  Brian B. Avants,et al.  Dementia induces correlated reductions in white matter integrity and cortical thickness: A multivariate neuroimaging study with sparse canonical correlation analysis , 2010, NeuroImage.

[24]  Byung-Joo Ham,et al.  Monoamine oxidase A–uVNTR genotype affects limbic brain activity in response to affective facial stimuli , 2008, Neuroreport.

[25]  M. Browne The maximum‐likelihood solution in inter‐battery factor analysis , 1979 .

[26]  Ingrid Agartz,et al.  Associations Between Variants Near a Monoaminergic Pathways Gene (PHOX2B) and Amygdala Reactivity: A Genome-Wide Functional Imaging Study , 2012, Twin Research and Human Genetics.

[27]  L. Tucker An inter-battery method of factor analysis , 1958 .

[28]  P. Thompson,et al.  Multilocus Genetic Analysis of Brain Images , 2011, Front. Gene..

[29]  Thomas E. Nichols,et al.  Anatomically-distinct genetic associations of APOE ɛ4 allele load with regional cortical atrophy in Alzheimer's disease , 2009, NeuroImage.

[30]  Samuel Kaski,et al.  Bayesian CCA via Group Sparsity , 2011, ICML.

[31]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[32]  Alfred O. Hero,et al.  A greedy approach to sparse canonical correlation analysis , 2008, 0801.2748.

[33]  Richard P. Bagozzi,et al.  fMRI Activities in the Emotional Cerebellum: A Preference for Negative Stimuli and Goal-Directed Behavior , 2011, The Cerebellum.

[34]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[35]  Dana C Crawford,et al.  Definition and clinical importance of haplotypes. , 2005, Annual review of medicine.

[36]  Andreas Meyer-Lindenberg,et al.  The future of fMRI and genetics research , 2012, NeuroImage.

[37]  L. Huber,et al.  Selective Imitation in Domestic Dogs , 2007, Current Biology.

[38]  Fred J Helmstetter,et al.  Neural Substrates Mediating Human Delay and Trace Fear Conditioning , 2004, The Journal of Neuroscience.

[39]  Michael Weiner,et al.  Hippocampal Surface Mapping of Genetic Risk Factors in AD via Sparse Learning Models , 2011, MICCAI.

[40]  M. Egan,et al.  Serotonin Transporter Genetic Variation and the Response of the Human Amygdala , 2002, Science.

[41]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[42]  Samuel Kaski,et al.  Bayesian Canonical correlation analysis , 2013, J. Mach. Learn. Res..

[43]  Daniel Mathalon,et al.  A genome-wide association study of schizophrenia using brain activation as a quantitative phenotype. , 2009, Schizophrenia bulletin.

[44]  P. McGuire,et al.  Functional atlas of emotional faces processing: a voxel-based meta-analysis of 105 functional magnetic resonance imaging studies. , 2009, Journal of psychiatry & neuroscience : JPN.

[45]  Michael Davis,et al.  The amygdala , 2000, Current Biology.

[46]  R. Dolan,et al.  Conscious and unconscious emotional learning in the human amygdala , 1998, Nature.

[47]  T. LaFramboise,et al.  Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances , 2009, Nucleic acids research.

[48]  Greig de Zubicaray,et al.  Neuroimaging and Genetics: Exploring, Searching, and Finding , 2012, Twin Research and Human Genetics.

[49]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[50]  A. Falini,et al.  Emotional reactivity in chronic schizophrenia: structural and functional brain correlates and the influence of adverse childhood experiences , 2010, Psychological Medicine.

[51]  E. Bramon,et al.  Exaggerated neural response to emotional faces in patients with bipolar disorder and their first degree relatives , 2011 .

[52]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[53]  Jing Li,et al.  Generating samples for association studies based on HapMap data , 2008, BMC Bioinformatics.

[54]  I. Gottesman,et al.  The endophenotype concept in psychiatry: etymology and strategic intentions. , 2003, The American journal of psychiatry.

[55]  Patrick M Fisher,et al.  Interaction between trait anxiety and trait anger predict amygdala reactivity to angry facial expressions in men but not women. , 2012, Social cognitive and affective neuroscience.

[56]  James M. Reecy,et al.  Use of Genome Sequence Information for Meat Quality Trait QTL Mining for Causal Genes and Mutations on Pig Chromosome 17 , 2011, Front. Gene..