Bridging heterogeneous mutation data to enhance disease gene discovery

Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations that link genotype and phenotype. However, due to the differences of size and quality between GWAS studies, not all de facto vital variations are able to pass the multiple testing. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain of function and loss of function. To bring together the heterogeneous mutation data, we propose a 'Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)' pipeline with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types and recovers false-negative GWAS mutations that fail to pass significant test but represent supportive evidences of functional biological process in literature. Eventually, we applied GDAMDB in Alzheimer's disease (AD) and predicted 79 AD-associated genes. Besides, 12 of them from the original GWAS, 60 of them are supported to be AD-related by other GWAS or literature report, and rest of them are newly predicted genes. Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.

[1]  Anastasia G. Efthymiou,et al.  Late onset Alzheimer’s disease genetics implicates microglial pathways in disease risk , 2017, Molecular Neurodegeneration.

[2]  Yuxing Wang,et al.  An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track Tasks , 2019, EMNLP.

[3]  A. Butte,et al.  Leveraging models of cell regulation and GWAS data in integrative network-based association studies , 2012, Nature Genetics.

[4]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[5]  D. Selkoe,et al.  Translating cell biology into therapeutic advances in Alzheimer's disease , 1999, Nature.

[6]  Peter Tontonoz,et al.  Attenuation of neuroinflammation and Alzheimer's disease pathology by liver x receptors , 2007, Proceedings of the National Academy of Sciences.

[7]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[8]  Gautier Koscielny,et al.  Open Targets: a platform for therapeutic target identification and validation , 2016, Nucleic Acids Res..

[9]  Ferdinando Di Cunto,et al.  Coding-Independent Regulation of the Tumor Suppressor PTEN by Competing Endogenous mRNAs , 2011, Cell.

[10]  K. Lunetta,et al.  Transethnic genome-wide scan identifies novel Alzheimer's disease loci , 2017, Alzheimer's & Dementia.

[11]  M. Shoji,et al.  MicroRNA expression profiles of neuron-derived extracellular vesicles in plasma from patients with amyotrophic lateral sclerosis , 2019, Neuroscience Letters.

[12]  Nick C Fox,et al.  Common variants in ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer’s disease , 2011, Nature Genetics.

[13]  Xia Li,et al.  Gain-of-Function Mutations: An Emerging Advantage for Cancer Biology. , 2019, Trends in biochemical sciences.

[14]  A. Ciccodicola,et al.  Non-coding RNA and pseudogenes in neurodegenerative diseases: “The (un)Usual Suspects” , 2012, Front. Gene..

[15]  Qingdong Guo,et al.  Profiling microRNA from Brain by Microarray in a Transgenic Mouse Model of Alzheimer's Disease , 2017, BioMed research international.

[16]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Wei Zheng,et al.  dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks , 2011, Bioinform..

[19]  J. Gallacher,et al.  Meta-analysis of genetic association with diagnosed Alzheimer’s disease identifies novel risk loci and implicates Abeta, Tau, immunity and lipid processing , 2018, bioRxiv.

[20]  Ulf Leser,et al.  SETH detects and normalizes genetic variants in text , 2016, Bioinform..

[21]  J. Hardy,et al.  Amyloid deposition as the central event in the aetiology of Alzheimer's disease. , 1991, Trends in pharmacological sciences.

[22]  Weiming Xia,et al.  Mutant presenilins of Alzheimer's disease increase production of 42-residue amyloid β-protein in both transfected cells and transgenic mice , 1997, Nature Medicine.

[23]  T. Raj,et al.  Untangling Genetic Risk for Alzheimer’s Disease , 2018, Biological Psychiatry.

[24]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[25]  F. LaFerla Calcium dyshomeostasis and intracellular signalling in alzheimer's disease , 2002, Nature Reviews Neuroscience.

[26]  Hong-yu Zhang,et al.  Rational drug repositioning by medical genetics , 2013, Nature Biotechnology.

[27]  Zhongming Zhao,et al.  Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes , 2016, Briefings Bioinform..

[28]  O. Andreassen,et al.  Dissecting the genetic relationship between cardiovascular risk factors and Alzheimer’s disease , 2018, Acta Neuropathologica.

[29]  Jin-Dong Kim,et al.  An Active Gene Annotation Corpus and Its Application on Anti-epilepsy Drug Discovery , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[30]  Jing Cheng,et al.  The emerging role of microRNA-4487/6845-3p in Alzheimer’s disease pathologies is induced by Aβ25–35 triggered in SH-SY5Y cell , 2018, BMC Systems Biology.

[31]  Na He,et al.  Optimization of in silico tools for predicting genetic variants: individualizing for genes with molecular sub-regional stratification , 2020, Briefings Bioinform..

[32]  S. Leurgans,et al.  Heritability of different forms of memory in the Late Onset Alzheimer's Disease Family Study. , 2011, Journal of Alzheimer's disease : JAD.

[33]  M. Nöthen,et al.  Follow-up of loci from the International Genomics of Alzheimer's Disease Project identifies TRIP4 as a novel susceptibility gene , 2014, Translational Psychiatry.

[34]  Margaret A. Pericak-Vance,et al.  Differences in apolipoprotein E3/3 and E4/4 allele-specific gene expression in hippocampus in Alzheimer disease , 2006, Neurobiology of Disease.

[35]  G. Tesco,et al.  The Endosome-associated Deubiquitinating Enzyme USP8 Regulates BACE1 Enzyme Ubiquitination and Degradation* , 2016, The Journal of Biological Chemistry.

[36]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[37]  Shubhabrata Mukherjee,et al.  Systems biology approach to late-onset Alzheimer's disease genome-wide association study identifies novel candidate genes validated using brain expression data and Caenorhabditis elegans experiments , 2017, Alzheimer's & Dementia.

[38]  Young-Kook Kim,et al.  Identification of the Role of miR-142-5p in Alzheimer’s Disease by Comparative Bioinformatics and Cellular Analysis , 2017, Front. Mol. Neurosci..

[39]  Nick C Fox,et al.  Gene-Wide Analysis Detects Two New Susceptibility Genes for Alzheimer's Disease , 2014, PLoS ONE.

[40]  H. Stefánsson,et al.  Loss-of-function variants in ABCA7 confer risk of Alzheimer's disease , 2015, Nature Genetics.

[41]  Jin Liu,et al.  IGESS: a statistical approach to integrating individual‐level genotype data and summary statistics in genome‐wide association studies , 2017, Bioinform..

[42]  A. Singleton,et al.  TREM2 variants in Alzheimer's disease. , 2013, The New England journal of medicine.

[43]  M. Wooten,et al.  Oxidative damage to the promoter region of SQSTM1/p62 is common to neurodegenerative disease , 2009, Neurobiology of Disease.

[44]  M. Lindberg,et al.  GSK3β is a negative regulator of the transcriptional coactivator MAML1 , 2009, Nucleic acids research.

[45]  T. Wyss-Coray,et al.  Complement Receptor 2 Is Expressed in Neural Progenitor Cells and Regulates Adult Hippocampal Neurogenesis , 2011, The Journal of Neuroscience.

[46]  C. Ramírez,et al.  miR-106b impairs cholesterol efflux and increases Aβ levels by repressing ABCA1 expression , 2012, Experimental Neurology.

[47]  G. Johnson,et al.  The role of tau phosphorylation and cleavage in neuronal cell death. , 2007, Frontiers in bioscience : a journal and virtual library.