Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares

Brain imaging is increasingly recognised as an intermediate phenotype to understand the complex path between genetics and behavioural or clinical phenotypes. In this context, a first goal is to propose methods to identify the part of genetic variability that explains some neuroimaging variability. Classical univariate approaches often ignore the potential joint effects that may exist between genes or the potential covariations between brain regions. In this paper, we propose instead to investigate an exploratory multivariate method in order to identify a set of Single Nucleotide Polymorphisms (SNPs) covarying with a set of neuroimaging phenotypes derived from functional Magnetic Resonance Imaging (fMRI). Recently, Partial Least Squares (PLS) regression or Canonical Correlation Analysis (CCA) have been proposed to analyse DNA and transcriptomics. Here, we propose to transpose this idea to the DNA vs. imaging context. However, in very high-dimensional settings like in imaging genetics studies, such multivariate methods may encounter overfitting issues. Thus we investigate the use of different strategies of regularisation and dimension reduction techniques combined with PLS or CCA to face the very high dimensionality of imaging genetics studies. We propose a comparison study of the different strategies on a simulated dataset first and then on a real dataset composed of 94 subjects, around 600,000 SNPs and 34 functional MRI lateralisation indexes computed from reading and speech comprehension contrast maps. We estimate the generalisability of the multivariate association with a cross-validation scheme and demonstrate the significance of this link, using a permutation procedure. Univariate selection appears to be necessary to reduce the dimensionality. However, the significant association uncovered by this two-step approach combining univariate filtering and L1-regularised PLS suggests that discovering meaningful genetic associations calls for a multivariate approach.

[1]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[2]  Cesare Furlanello,et al.  An accelerated procedure for recursive feature ranking on microarray data , 2003, Neural Networks.

[3]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[4]  A. Tenenhaus,et al.  Regularized Generalized Canonical Correlation Analysis , 2011, Eur. J. Oper. Res..

[5]  P. Thompson,et al.  Neuroimaging endophenotypes: Strategies for finding genes influencing brain structure and function , 2007, Human brain mapping.

[6]  F. Fazio,et al.  Dyslexia: Cultural Diversity and Biological Unity , 2001, Science.

[7]  David Tritchler,et al.  Genome-wide sparse canonical correlation of gene expression with genotypes , 2007, BMC proceedings.

[8]  Charlotte Soneson,et al.  Integrative analysis of gene expression and copy number alterations using canonical correlation analysis , 2010, BMC Bioinformatics.

[9]  Andrew J. Saykin,et al.  Voxelwise genome-wide association study (vGWAS) , 2010, NeuroImage.

[10]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[11]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[12]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[13]  J. V. Haxby,et al.  Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares , 1996, NeuroImage.

[14]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[15]  Philippe Besse,et al.  Sparse canonical methods for biological data integration: application to a cross-platform study , 2009, BMC Bioinformatics.

[16]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[18]  L. Tucker An inter-battery method of factor analysis , 1958 .

[19]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[20]  Shah Ebrahim,et al.  Common variants in the GDF5-UQCC region are associated with variation in human height , 2008, Nature Genetics.

[21]  A. Zwinderman,et al.  Statistical Applications in Genetics and Molecular Biology Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis , 2011 .

[22]  Thomas E. Nichols,et al.  Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach , 2010, NeuroImage.

[23]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  B. Thirion,et al.  Fast reproducible identification and large-scale databasing of individual functional cognitive networks , 2007, BMC Neuroscience.

[28]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[29]  Jing Li,et al.  Generating samples for association studies based on HapMap data , 2008, BMC Bioinformatics.

[30]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[31]  S. Dehaene,et al.  Beyond Hemispheric Dominance: Brain Regions Underlying the Joint Lateralization of Language and Arithmetic to the Left Hemisphere , 2010, Journal of Cognitive Neuroscience.

[32]  Michael Weiner,et al.  Voxelwise gene-wide association study (vGeneWAS): Multivariate gene-based association testing in 731 elderly subjects , 2011, NeuroImage.

[33]  Andrew J Saykin,et al.  Mechanisms of working memory dysfunction after mild and moderate TBI: evidence from functional MRI and neurogenetics. , 2006, Journal of neurotrauma.

[34]  Daniel R Weinberger,et al.  Neuroimaging-genetic paradigms: a new approach to investigate the pathophysiology and treatment of cognitive deficits in schizophrenia. , 2006, Harvard review of psychiatry.

[35]  Vince D. Calhoun,et al.  A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data , 2009, NeuroImage.

[36]  D. Clayton,et al.  An R Package for Analysis of Whole-Genome Association Studies , 2007, Human Heredity.

[37]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .