Enforcing Co-Expression Within a Brain-Imaging Genomics Regression Framework

Among the challenges arising in brain imaging genetic studies, estimating the potential links between neurological and genetic variability within a population is key. In this paper, we propose a multivariate, multimodal formulation for variable selection that leverages co-expression patterns across various data modalities. Our approach is based on an intuitive combination of two widely used statistical models: sparse regression and canonical correlation analysis (CCA). While the former seeks multivariate linear relationships between a given phenotype and associated observations, the latter searches to extract co-expression patterns between sets of variables belonging to different modalities. In the following, we propose to rely on a “CCA-type” formulation in order to regularize the classical multimodal sparse regression problem (essentially incorporating both CCA and regression models within a unified formulation). The underlying motivation is to extract discriminative variables that are also co-expressed across modalities. We first show that the simplest formulation of such model can be expressed as a special case of collaborative learning methods. After discussing its limitation, we propose an extended, more flexible formulation, and introduce a simple and efficient alternating minimization algorithm to solve the associated optimization problem. We explore the parameter space and provide some guidelines regarding parameter selection. Both the original and extended versions are then compared on a simple toy data set and a more advanced simulated imaging genomics data set in order to illustrate the benefits of the latter. Finally, we validate the proposed formulation using single nucleotide polymorphisms data and functional magnetic resonance imaging data from a population of adolescents ( ${n} = 362$ subjects, age 16.9 ± 1.9 years from the Philadelphia Neurodevelopmental Cohort) for the study of learning ability. Furthermore, we carry out a significance analysis of the resulting features that allow us to carefully extract brain regions and genes linked to learning and cognitive ability.

[1]  V. Calhoun,et al.  Combining fMRI and SNP data to investigate connections between brain function and genetics using parallel ICA , 2009, Human brain mapping.

[2]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[3]  Andrew R. A. Conway,et al.  Working memory capacity and its relation to general intelligence , 2003, Trends in Cognitive Sciences.

[4]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[5]  Jong Woo Kim,et al.  Association study between polymorphisms of the PARD3 gene and schizophrenia. , 2012, Experimental and therapeutic medicine.

[6]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[7]  R. Engle,et al.  The role of prefrontal cortex in working-memory capacity, executive attention, and general fluid intelligence: An individual-differences perspective , 2002, Psychonomic bulletin & review.

[8]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Wen Gao,et al.  Efficient Generalized Fused Lasso and Its Applications , 2016, ACM Trans. Intell. Syst. Technol..

[11]  Shannon L. Risacher,et al.  A Novel Structure-Aware Sparse Learning Algorithm for Brain Imaging Genetics , 2014, MICCAI.

[12]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[13]  Vince D. Calhoun,et al.  Joint sparse canonical correlation analysis for detecting differential imaging genetics modules , 2016, Bioinform..

[14]  Vince D. Calhoun,et al.  Correspondence between fMRI and SNP data by group sparse canonical correlation analysis , 2014, Medical Image Anal..

[15]  Nicholas G Martin,et al.  Imaging genomics. , 2010, Current opinion in neurology.

[16]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[17]  V. Calhoun,et al.  An introductory review of parallel independent component analysis (p-ICA) and a guide to applying p-ICA to genetic data and imaging phenotypes to identify disease-associated biological pathways and systems in common complex disorders , 2015, Front. Genet..

[18]  F. Bushman,et al.  Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. , 2013, Biostatistics.

[19]  Antonio Moreno,et al.  Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares , 2012, NeuroImage.

[20]  Shannon L. Risacher,et al.  Structured sparse canonical correlation analysis for brain imaging genetics: an improved GraphNet method , 2016, Bioinform..

[21]  Daoqiang Zhang,et al.  Manifold regularized multitask feature learning for multimodality disease classification , 2015, Human brain mapping.

[22]  Tianzi Jiang,et al.  The Neuronal Correlates of Digits Backward Are Revealed by Voxel-Based Morphometry and Resting-State Functional Connectivity Analyses , 2012, PloS one.

[23]  Stuart J. Ritchie,et al.  Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112 151) , 2016, Molecular Psychiatry.

[24]  Vince D. Calhoun,et al.  Enforcing Co-expression in Multimodal Regression Framework , 2017, PSB.

[25]  Vince D. Calhoun,et al.  Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs , 2014, NeuroImage.

[26]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[27]  Hao He,et al.  Integrative analysis of multiple diverse omics datasets by sparse group multitask regression , 2014, Front. Cell Dev. Biol..

[28]  Vince D. Calhoun,et al.  A review of multivariate analyses in imaging genetics , 2014, Front. Neuroinform..

[29]  L. Tsai,et al.  Histone deacetylases in memory and cognition , 2014, Science Signaling.

[30]  Mark A. Elliott,et al.  The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth , 2016, NeuroImage.

[31]  C. Croux,et al.  Sparse canonical correlation analysis from a predictive point of view , 2015, Biometrical journal. Biometrische Zeitschrift.

[32]  C. Jack,et al.  Genome-wide scan of healthy human connectome discovers SPON1 gene variant influencing dementia severity , 2013, Proceedings of the National Academy of Sciences.

[33]  Andrew J. Saykin,et al.  Voxelwise genome-wide association study (vGWAS) , 2010, NeuroImage.

[34]  Robert Tibshirani,et al.  Collaborative regression. , 2014, Biostatistics.