Coupled Mixed Model for joint genetic analysis of complex disorders from independently collected data sets: application to Alzheimer's disease and substance use disorder

In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering genetic variations associated with diseases. “Joint analysis”, which involves analyzing multiple independently generated GWAS data sets, has been explored extensively as follow-ups. However, most of the analyses remain in the preliminary stage. In this paper, we propose a method, called Coupled Mixed Model (CMM), that allows performing the joint analysis of GWAS on two independently collected data sets with different phenotypes, using a set of multivariate sparse mixed models. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the uncollected phenotypes using statistical learning modeling. Moreover, CMM takes account of the confounders due to population stratification, family structures, and cryptic relatedness, as well as the confounders arising during data collection that frequently appear in joint genetic studies. In this work, we verify the performance of our method using simulation experiments. We also conduct a real data joint analysis of two independent data sets which were generated for investigation of genetic association for Alzheimer’s disease and substance use disorder, respectively. Our results reveal new insights into these diseases. A Python implementation of the software is available at: https://github.com/HaohanWang/CMM Author summary We present a method, namely Coupled Mixed Model, that allows user to conduct joint GWAS analysis using independently collected data sets, even when there are no shared phenotypes collected among subjects in the two data sets. Our method can automatically infer the missing phenotypes in the joint analysis using likelihood estimation given that the phenotypes are related. More importantly, joint GWAS analyses across data sets are usually considered challenging due to the existence of confounding factors including population stratification, family structure, confounders due to data collection, or cryptic relatedness. After extensive simulation experiments, we verify the performance of our method. We further apply our method to jointly analyze common genetic factors underlying both Alzheimer’s disease and substance use disorder.

[1]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[2]  Eric P. Xing,et al.  Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data , 2018, Bioinform..

[3]  Eric P. Xing,et al.  A time-varying group sparse additive model for genome-wide association studies of dynamic complex traits , 2016, Bioinform..

[4]  Helen E. Gibson,et al.  TRPV1 Channels Mediate Long-Term Depression at Synapses on Hippocampal Interneurons , 2008, Neuron.

[5]  D. Boomsma,et al.  GWIS: Genome-Wide Inferred Statistics for Functions of Multiple Phenotypes. , 2016, American journal of human genetics.

[6]  S. Schneider,et al.  Mutations in ANO3 cause dominant craniocervical dystonia: ion channel implicated in pathogenesis. , 2012, American journal of human genetics.

[7]  Genetic risk for schizophrenia influences substance use in emerging adulthood: an event-level polygenic prediction model , 2017, bioRxiv.

[8]  P. Visscher,et al.  Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model , 2015, PLoS genetics.

[9]  Joseph T. Glessner,et al.  GWAS meta analysis identifies TSNARE1 as a novel Schizophrenia / Bipolar susceptibility locus , 2013, Scientific Reports.

[10]  Bhupesh Sharma,et al.  Protective effect of transient receptor potential vanilloid subtype 1 (TRPV1) modulator, against behavioral, biochemical and structural damage in experimental models of Alzheimer's disease , 2016, Brain Research.

[11]  P. Lichtenstein,et al.  Genome‐wide analysis of adolescent psychotic‐like experiences shows genetic overlap with psychiatric disorders , 2018, bioRxiv.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  M. Weiner,et al.  Genomic Copy Number Analysis in Alzheimer's Disease and Mild Cognitive Impairment: An ADNI Study , 2011, International journal of Alzheimer's disease.

[14]  David Heckerman,et al.  FaST-LMM-Select for addressing confounding from spatial structure and rare variants , 2013, Nature Genetics.

[15]  Eric P. Xing,et al.  Multiplex Confounding Factor Correction for Genomic Association Mapping with Squared Sparse Linear Mixed Model , 2017, bioRxiv.

[16]  Jian Huang,et al.  LPG: A four-group probabilistic approach to leveraging pleiotropy in genome-wide association studies , 2018, BMC Genomics.

[17]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18]  J. Marchini,et al.  A multiple phenotype imputation method for genetic studies , 2016, Nature Genetics.

[19]  N. Chatterjee,et al.  Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits , 2018, PLoS genetics.

[20]  T. Yamauchi Neuronal Ca2+/calmodulin-dependent protein kinase II--discovery, progress in a quarter of a century, and perspective: implication for learning and memory. , 2005, Biological & pharmaceutical bulletin.

[21]  X. Wen,et al.  Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization , 2016, bioRxiv.

[22]  K. Fromme,et al.  Genetic risk for schizophrenia influences substance use in emerging adulthood: an event-level polygenic prediction model , 2017, bioRxiv.

[23]  Gabriel E. Hoffman,et al.  Correcting for Population Structure and Kinship Using the Linear Mixed Model: Theory and Extensions , 2013, PloS one.

[24]  Naomi R. Wray,et al.  Evidence of CNIH3 involvement in opioid dependence , 2015, Molecular Psychiatry.

[25]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[26]  Y. Tsutsumi,et al.  Proteomic analysis of the hippocampus in Alzheimer's disease model mice by using two-dimensional fluorescence difference in gel electrophoresis , 2013, Neuroscience Letters.

[27]  Jong Wha J. Joo,et al.  Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice , 2014, PLoS genetics.

[28]  D. Tagle,et al.  PACSIN 1 interacts with huntingtin and is absent from synaptic varicosities in presymptomatic Huntington's disease brains. , 2002, Human molecular genetics.

[29]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[30]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.

[31]  P. Visscher,et al.  MTAG: Multi-Trait Analysis of GWAS , 2017, bioRxiv.

[32]  Marek Kimmel,et al.  simuPOP: a forward-time population genetics simulation environment , 2005, Bioinform..

[33]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[34]  R. Huganir,et al.  PICK1 interacts with PACSIN to regulate AMPA receptor internalization and cerebellar long-term depression , 2013, Proceedings of the National Academy of Sciences.

[35]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[36]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[37]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[38]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[39]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[40]  Xiaofeng Zhu,et al.  Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. , 2015, American journal of human genetics.

[41]  Christoph Lange,et al.  Genome-wide association analysis reveals putative Alzheimer's disease susceptibility loci in addition to APOE. , 2008, American journal of human genetics.

[42]  Dan Geiger,et al.  Multikernel linear mixed models for complex phenotype prediction , 2016, Genome research.

[43]  A W Toga,et al.  FASTKD2 is associated with memory and hippocampal structure in older adults , 2014, Molecular Psychiatry.

[44]  R. Malenka,et al.  Synaptic plasticity and addiction , 2007, Nature Reviews Neuroscience.

[45]  Hongbing Shen,et al.  Joint analysis of three genome- wide association studies of esophageal squamous cell carcinoma in Chinese populations , 2014 .

[46]  J. Anthony,et al.  Genome wide association for substance dependence: convergent results from epidemiologic and research volunteer samples , 2008, BMC Medical Genetics.

[47]  Chang Liu,et al.  Predicting Drug–Target Interactions Using Probabilistic Matrix Factorization , 2013, J. Chem. Inf. Model..

[48]  S. Weiss,et al.  Joint GWAS Analysis: Comparing similar GWAS at different genomic resolutions identifies novel pathway associations with six complex diseases , 2014, Genomics data.

[49]  Xiang Zhou,et al.  Pleiotropic Mapping and Annotation Selection in Genome-wide Association Studies with Penalized Gaussian Mixture Models , 2018, bioRxiv.

[50]  A. J. Robison,et al.  Emerging role of CaMKII in neuropsychiatric disease , 2014, Trends in Neurosciences.

[51]  Jeesun Jung,et al.  Epidemiology of DSM-5 Drug Use Disorder: Results From the National Epidemiologic Survey on Alcohol and Related Conditions-III. , 2016, JAMA psychiatry.

[52]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[53]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[54]  Eric P. Xing,et al.  Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies , 2017, BIBM.

[55]  Valentin Dinu,et al.  Data mining of high density genomic variant data for prediction of Alzheimer's disease risk , 2012, BMC Medical Genetics.

[56]  Mary A. Logan,et al.  Glial Draper Rescues Aβ Toxicity in a Drosophila Model of Alzheimer's Disease , 2017, The Journal of Neuroscience.

[57]  Vidar M. Steen,et al.  The Complement Control-Related Genes CSMD1 and CSMD2 Associate to Schizophrenia , 2011, Biological Psychiatry.

[58]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[59]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[60]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[61]  R. Harris,et al.  Deletion of vanilloid receptor (TRPV1) in mice alters behavioral effects of ethanol , 2009, Neuropharmacology.

[62]  Wei Liu,et al.  Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction , 2017, PLoS genetics.

[63]  Y. Sunada,et al.  A simplified and sensitive method to identify Alzheimer's disease biomarker candidates using patient-derived induced pluripotent stem cells (iPSCs). , 2017, Journal of biochemistry.

[64]  Hyoung‐Chun Kim,et al.  Transient Receptor Potential Vanilloid Type 1 Channel May Modulate Opioid Reward , 2014, Neuropsychopharmacology.

[65]  R. Huganir,et al.  Disruption of Glutamate Receptor-Interacting Protein in Nucleus Accumbens Enhances Vulnerability to Cocaine Relapse , 2014, Neuropsychopharmacology.

[66]  David W. Self,et al.  Striatal Signal Transduction and Drug Addiction , 2011, Front. Neuroanat..

[67]  R. Crystal,et al.  Cigarette smoking induces small airway epithelial epigenetic changes with corresponding modulation of gene expression , 2013, Epigenetics & Chromatin.

[68]  B. Grant,et al.  Epidemiology of DSM-5 Alcohol Use Disorder: Results From the National Epidemiologic Survey on Alcohol and Related Conditions III. , 2015, JAMA psychiatry.

[69]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[70]  P. Visscher,et al.  Multi-trait analysis of genome-wide association summary statistics using MTAG , 2017, Nature Genetics.

[71]  O L Lopez,et al.  Genome-wide copy-number variation study of psychosis in Alzheimer's disease , 2015, Translational Psychiatry.

[72]  Y. Bae,et al.  MEGF10 functions as a receptor for the uptake of amyloid‐β , 2010, FEBS letters.

[73]  Eleazar Eskin,et al.  Imputing Phenotypes for Genome-wide Association Studies. , 2016, American journal of human genetics.

[74]  Chao Yang,et al.  LLR: a latent low‐rank approach to colocalizing genetic risk variants in multiple GWAS , 2017, Bioinform..

[75]  Robert D Gibbons,et al.  Multiple imputation for harmonizing longitudinal non‐commensurate measures in individual participant data meta‐analysis , 2015, Statistics in medicine.

[76]  Bonnie Berger,et al.  Improving the Power of GWAS and Avoiding Confounding from Population Stratification with PC-Select , 2014, Genetics.

[77]  M. Karimi,et al.  The Role of Dopamine and Dopaminergic Pathways in Dystonia: Insights from Neuroimaging , 2015, Tremor and other hyperkinetic movements.

[78]  Doug Speed,et al.  MultiBLUP: improved SNP-based prediction for complex traits , 2014, Genome research.

[79]  C. Langefeld,et al.  Genome-Wide Association Scan for Survival on Dialysis in African-Americans with Type 2 Diabetes , 2011, American Journal of Nephrology.

[80]  Anders D. Børglum,et al.  Genome-wide association study identifies five new schizophrenia loci , 2011, Nature Genetics.

[81]  Tom R. Gaunt,et al.  DNA methylation and substance-use risk: a prospective, genome-wide study spanning gestation to adolescence , 2016, Translational psychiatry.

[82]  D. Rujescu,et al.  Trans-ancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders , 2018, Nature Neuroscience.

[83]  J. Kornhuber,et al.  CaM Kinases: From Memories to Addiction. , 2016, Trends in pharmacological sciences.

[84]  Sungeun Kim,et al.  GWAS of the joint ADGC data set identifies novel common variants associated with late-onset Alzheimer's disease , 2013, Alzheimer's & Dementia.