Fast and Accurate Detection of Complex Imaging Genetics Associations Based on Greedy Projected Distance Correlation

Recent advances in imaging genetics produce large amounts of data including functional MRI images, single nucleotide polymorphisms (SNPs), and cognitive assessments. Understanding the complex interactions among these heterogeneous and complementary data has the potential to help with diagnosis and prevention of mental disorders. However, limited efforts have been made due to the high dimensionality, group structure, and mixed type of these data. In this paper, we present a novel method to detect conditional associations between imaging genetics data. We use projected distance correlation to build a conditional dependency graph among high-dimensional mixed data, and then use multiple testing to detect significant group level associations (e.g., regions of interest-gene). In addition, we introduce a scalable algorithm based on orthogonal greedy algorithm, yielding the greedy projected distance correlation (G-PDC). This can reduce the computational cost, which is critical for analyzing large volume of imaging genomics data. The results from our simulations demonstrate a higher degree of accuracy with G-PDC than distance correlation, Pearson’s correlation, and partial correlation, especially when the correlation is nonlinear. Finally, we apply our method to the Philadelphia Neurodevelopmental data cohort with 866 samples including fMRI images and SNP profiles. The results uncover several statistically significant and biologically interesting interactions, which are further validated with many existing studies. The MATLAB code is available at https://sites.google.com/site/jianfang86/gPDC.

[1]  S. Leal,et al.  Homozygosity mapping reveals mutations of GRXCR1 as a cause of autosomal-recessive nonsyndromic hearing impairment. , 2010, American journal of human genetics.

[2]  Vinod Menon,et al.  Functional connectivity in the resting brain: A network analysis of the default mode hypothesis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Christos Davatzikos,et al.  Neuroimaging of the Philadelphia Neurodevelopmental Cohort , 2014, NeuroImage.

[4]  A. Jauch,et al.  3p25.3 microdeletion of GABA transporters SLC6A1 and SLC6A11 results in intellectual disability, epilepsy and stereotypic behavior , 2014, American journal of medical genetics. Part A.

[5]  Vince D. Calhoun,et al.  Group sparse canonical correlation analysis for genomic data integration , 2013, BMC Bioinformatics.

[6]  Yang Feng,et al.  A Projection-based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical Models. , 2015, Journal of econometrics.

[7]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[8]  K. Takamiya,et al.  Differential expression of isoforms of PSD‐95 binding protein (GKAP/SAPAP1) during rat brain development 1 , 1997, FEBS letters.

[9]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[10]  Chia-Hsiang Chen,et al.  Genetic analysis of the DLGAP1 gene as a candidate gene for schizophrenia , 2013, Psychiatry Research.

[11]  Pradeep Ravikumar,et al.  Mixed Graphical Models via Exponential Families , 2014, AISTATS.

[12]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[13]  Margaret A. Pericak-Vance,et al.  A genome-wide scan for common alleles affecting risk for autism , 2010, Human molecular genetics.

[14]  E. Bullmore,et al.  A Resilient, Low-Frequency, Small-World Human Brain Functional Network with Highly Connected Association Cortical Hubs , 2006, The Journal of Neuroscience.

[15]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[16]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[17]  Wei Chen,et al.  FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks , 2016, PLoS Comput. Biol..

[18]  Pascal Sarda,et al.  Factor models and variable selection in high-dimensional regression analysis , 2011 .

[19]  R. Watts,et al.  Sensitivity to posed and genuine displays of happiness and sadness: A fMRI study , 2012, Neuroscience Letters.

[20]  John Blangero,et al.  MACROD2 gene associated with autistic-like traits in a general population sample , 2014, Psychiatric genetics.

[21]  D. Weinberger,et al.  Imaging Genetics: Perspectives from Studies of Genetically Driven Variation in Serotonin Function and Corticolimbic Affective Processing , 2006, Biological Psychiatry.

[22]  P. Chauvel,et al.  The Role of Semiology in the Work-Up of Frontal Lobe Epilepsy: In the Eye of the Beholder , 2014 .

[23]  Bob L. Sturm,et al.  Comparison of orthogonal matching pursuit implementations , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[24]  R. Díaz,et al.  Specific cerebellar and cortical degeneration correlates with ataxia severity in spinocerebellar ataxia type 7 , 2015, Brain Imaging and Behavior.

[25]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[26]  David A. Pearce,et al.  Reelin signaling is impaired in autism , 2005, Biological Psychiatry.

[27]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[28]  Tong Zhang,et al.  On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..

[29]  V. Calhoun,et al.  An introductory review of parallel independent component analysis (p-ICA) and a guide to applying p-ICA to genetic data and imaging phenotypes to identify disease-associated biological pathways and systems in common complex disorders , 2015, Front. Genet..

[30]  A. Battaglia,et al.  6p25 interstitial deletion in two dizygotic twins with gyral pattern anomaly and speech and language disorder. , 2013, European journal of paediatric neurology : EJPN : official journal of the European Paediatric Neurology Society.

[31]  Scott Peltier,et al.  Abnormalities of intrinsic functional connectivity in autism spectrum disorders, , 2009, NeuroImage.

[32]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[33]  Jianqing Fan,et al.  Decorrelation of Covariates for High Dimensional Sparse Regression , 2016 .

[34]  Thomas E. Nichols,et al.  Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach , 2010, NeuroImage.

[35]  Yong He,et al.  BrainNet Viewer: A Network Visualization Tool for Human Brain Connectomics , 2013, PloS one.

[36]  Stefano Diciotti,et al.  Neurodegeneration in friedreich's ataxia is associated with a mixed activation pattern of the brain. A fMRI study , 2012, Human brain mapping.

[37]  V. Calhoun,et al.  Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness. , 2016, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[38]  Yu Zhang,et al.  The Human Brainnetome Atlas: A New Brain Atlas Based on Connectional Architecture , 2016, Cerebral cortex.

[39]  M. Chun,et al.  Functional connectome fingerprinting: Identifying individuals based on patterns of brain connectivity , 2015, Nature Neuroscience.

[40]  Maria L. Rizzo,et al.  Partial Distance Correlation with Methods for Dissimilarities , 2013, 1310.2926.

[41]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[42]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[43]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[44]  Sara A. Schmidt,et al.  The effect of mild-to-moderate hearing loss on auditory and emotion processing networks , 2014, Front. Syst. Neurosci..

[45]  Vince D. Calhoun,et al.  A review of multivariate analyses in imaging genetics , 2014, Front. Neuroinform..

[46]  O. Andreassen,et al.  Delayed stabilization and individualization in connectome development are related to psychiatric disorders , 2017, Nature Neuroscience.

[47]  Mark A. Elliott,et al.  The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth , 2016, NeuroImage.

[48]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[49]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[50]  Vince D. Calhoun,et al.  Polymorphism of DCDC2 Reveals Differences in Cortical Morphology of Healthy Individuals—A Preliminary Voxel Based Morphometry Study , 2008, Brain Imaging and Behavior.

[51]  Thomas E. Nichols,et al.  Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate , 2002, NeuroImage.

[52]  Andreas Meyer-Lindenberg,et al.  The future of fMRI and genetics research , 2012, NeuroImage.

[53]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[54]  Lars T. Westlye,et al.  Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of MRI and genetic data , 2015, NeuroImage.

[55]  T. Lai,et al.  A STEPWISE REGRESSION METHOD AND CONSISTENT MODEL SELECTION FOR HIGH-DIMENSIONAL SPARSE LINEAR MODELS , 2011 .

[56]  Vince D. Calhoun,et al.  A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data , 2009, NeuroImage.