ProGeM: a framework for the prioritization of candidate causal genes at molecular quantitative trait loci

Abstract Quantitative trait locus (QTL) mapping of molecular phenotypes such as metabolites, lipids and proteins through genome-wide association studies represents a powerful means of highlighting molecular mechanisms relevant to human diseases. However, a major challenge of this approach is to identify the causal gene(s) at the observed QTLs. Here, we present a framework for the ‘Prioritization of candidate causal Genes at Molecular QTLs’ (ProGeM), which incorporates biological domain-specific annotation data alongside genome annotation data from multiple repositories. We assessed the performance of ProGeM using a reference set of 227 previously reported and extensively curated metabolite QTLs. For 98% of these loci, the expert-curated gene was one of the candidate causal genes prioritized by ProGeM. Benchmarking analyses revealed that 69% of the causal candidates were nearest to the sentinel variant at the investigated molecular QTLs, indicating that genomic proximity is the most reliable indicator of ‘true positive’ causal genes. In contrast, cis-gene expression QTL data led to three false positive candidate causal gene assignments for every one true positive assignment. We provide evidence that these conclusions also apply to other molecular phenotypes, suggesting that ProGeM is a powerful and versatile tool for annotating molecular QTLs. ProGeM is freely available via GitHub.

[1]  J. Finkelstein,et al.  Homocystinuria associated with decreased methylenetetrahydrofolate reductase activity. , 1972, Biochemical and biophysical research communications.

[2]  D. Galton,et al.  A NEW TYPE OF FAMILIAL HYPERCHOLESTEROLÆMIA , 1975, The Lancet.

[3]  H. Harris,et al.  A missense mutation in the human liver/bone/kidney alkaline phosphatase gene causing a lethal form of hypophosphatasia. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M. Kimoto,et al.  Detection of NG,NGDimethylarginine Dimethylaminohydrolase in Human Tissues Using a Monoclonal Antibody , 1995 .

[5]  M. Kimoto,et al.  Detection of NG,NG-dimethylarginine dimethylaminohydrolase in human tissues using a monoclonal antibody. , 1995, Journal of biochemistry.

[6]  K. Srivenugopal,et al.  Activity and distribution of the cysteine prodrug activating enzyme, 5-oxo-L-prolinase, in human normal and tumor tissues. , 1997, Cancer letters.

[7]  F. Nielsen,et al.  A Family of Insulin-Like Growth Factor II mRNA-Binding Proteins Represses Translation in Late Development , 1999, Molecular and Cellular Biology.

[8]  D. Dooley,et al.  Human kidney diamine oxidase: heterologous expression, purification, and characterization , 2002, JBIC Journal of Biological Inorganic Chemistry.

[9]  D. Stevenson,et al.  Acute hemolysis and severe neonatal hyperbilirubinemia in glucose-6-phosphate dehydrogenase-deficient heterozygotes. , 2001, The Journal of pediatrics.

[10]  Jonathan C. Cohen,et al.  The Modular Adaptor Protein ARH Is Required for Low Density Lipoprotein (LDL) Binding and Internalization but Not for LDL Receptor Clustering in Coated Pits* , 2004, Journal of Biological Chemistry.

[11]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[12]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[13]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[14]  L. Liang,et al.  A genome-wide association study of global gene expression , 2007, Nature Genetics.

[15]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[16]  Bart De Moor,et al.  Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations , 2007, Nucleic acids research.

[17]  David S. Wishart,et al.  Nucleic Acids Research Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs and Metabolites , 2008 .

[18]  Muin J. Khoury,et al.  Gene Prospector: An evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases , 2008, BMC Bioinformatics.

[19]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[20]  R. Collins,et al.  Novel Associations of CPS1, MUT, NOX4, and DPEP1 With Plasma Homocysteine in a Healthy Population: A Genome-Wide Evaluation of 13 974 Participants in the Women’s Genome Health Study , 2009, Circulation. Cardiovascular genetics.

[21]  G. Abecasis,et al.  Common variants in the SLCO1B3 locus are associated with bilirubin levels and unconjugated hyperbilirubinemia. , 2009, Human Molecular Genetics.

[22]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[23]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[24]  C. Gieger,et al.  Human metabolic individuality in biomedical and pharmaceutical research , 2011, Nature.

[25]  Magda Tsolaki,et al.  Identification of cis-regulatory variation influencing protein abundance levels in human plasma. , 2012, Human molecular genetics.

[26]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[27]  Bart De Moor,et al.  An unbiased evaluation of gene prioritization tools , 2012, Bioinform..

[28]  Andrew D. Johnson,et al.  Common genetic loci influencing plasma homocysteine concentrations and their effect on risk of coronary artery disease. , 2013, The American journal of clinical nutrition.

[29]  K. Suhre,et al.  Metabolomics platforms for genome wide association studies--linking the genome to the metabolome. , 2013, Current opinion in biotechnology.

[30]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[31]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[32]  T. Spector,et al.  Lipidomics Profiling and Risk of Cardiovascular Disease in the Prospective Population-Based Bruneck Study , 2014, Circulation.

[33]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[34]  P. Wong,et al.  Mfsd2a is a transporter for the essential omega-3 fatty acid docosahexaenoic acid , 2014, Nature.

[35]  Jonathan Mant,et al.  The INTERVAL trial to determine whether intervals between blood donations can be safely and acceptably decreased to optimise blood supply: study protocol for a randomised controlled trial , 2014, Trials.

[36]  M. Aschner,et al.  SLC30A10 Is a Cell Surface-Localized Manganese Efflux Transporter, and Parkinsonism-Causing Mutations Block Its Intracellular Trafficking and Efflux Activity , 2014, The Journal of Neuroscience.

[37]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[38]  G. Kempermann Faculty Opinions recommendation of Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. , 2015 .

[39]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[40]  Gabi Kastenmüller,et al.  SNiPA: an interactive, genetic variant-centered annotation browser , 2014, Bioinform..

[41]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[42]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[43]  Stephen Burgess,et al.  PhenoScanner: a database of human genotype–phenotype associations , 2016, Bioinform..

[44]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[45]  J. Danesh,et al.  Association analyses based on false discovery rate implicate new loci for coronary artery disease , 2017, Nature Genetics.

[46]  Tom R. Gaunt,et al.  Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity , 2016, Nature.

[47]  Erdogan Taskesen,et al.  Functional mapping and annotation of genetic associations with FUMA , 2017, Nature Communications.

[48]  A. Hofman,et al.  Disease variants alter transcription factor levels and methylation of their binding sites , 2016, Nature Genetics.

[49]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[50]  Christian Gieger,et al.  Connecting genetic risk to disease end points through the human blood plasma proteome , 2016, Nature Communications.

[51]  Stephen Burgess,et al.  Genomic atlas of the human plasma proteome , 2018, Nature.