Candidate disease gene prediction using Gentrepid: application to a genome-wide association study on coronary artery disease

Current single‐locus‐based analyses and candidate disease gene prediction methodologies used in genome‐wide association studies (GWAS) do not capitalize on the wealth of the underlying genetic data, nor functional data available from molecular biology. Here, we analyzed GWAS data from the Wellcome Trust Case Control Consortium (WTCCC) on coronary artery disease (CAD). Gentrepid uses a multiple‐locus‐based approach, drawing on protein pathway‐ or domain‐based data to make predictions. Known disease genes may be used as additional information (seeded method) or predictions can be based entirely on GWAS single nucleotide polymorphisms (SNPs) (ab initio method). We looked in detail at specific predictions made by Gentrepid for CAD and compared these with known genetic data and the scientific literature. Gentrepid was able to extract known disease genes from the candidate search space and predict plausible novel disease genes from both known and novel WTCCC‐implicated loci. The disease gene candidates are consistent with known biological information. The results demonstrate that this computational approach is feasible and a valuable discovery tool for geneticists.

[1]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.

[2]  S. Humphries,et al.  Endothelial Nitric Oxide Synthase Genotype and Ischemic Heart Disease: Meta-Analysis of 26 Studies Involving 23028 Subjects , 2004, Circulation.

[3]  Jiahuai Han,et al.  A beta-catenin-independent dorsalization pathway activated by Axin/JNK signaling and antagonized by aida. , 2007, Developmental cell.

[4]  P. Tam Faculty Opinions recommendation of miR-145 and miR-143 regulate smooth muscle cell fate and plasticity. , 2009 .

[5]  Alberto Piazza,et al.  Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants , 2009, Nature Genetics.

[6]  Jason Y. Liu,et al.  Analysis of genome-wide association study data using the protein knowledge base , 2011, BMC Genetics.

[7]  N. Schork,et al.  Pathway analysis of seven common diseases assessed by genome-wide association. , 2008, Genomics.

[8]  T. Manolio,et al.  How to Interpret a Genome-wide Association Study Topic Collections , 2022 .

[9]  R. Scott,et al.  LIM kinases: function, regulation and association with human disease , 2007, Journal of Molecular Medicine.

[10]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[11]  N. Katsanis,et al.  Human genetics and disease: Beyond Mendel: an evolving view of human genetic disease transmission , 2002, Nature Reviews Genetics.

[12]  David Valle,et al.  Human disease genes , 2001, Nature.

[13]  C. Wijmenga,et al.  Using genome‐wide pathway analysis to unravel the etiology of complex diseases , 2009, Genetic epidemiology.

[14]  C. Gu,et al.  Pathway-based genome-wide association analysis of coronary heart disease identifies biologically important gene sets , 2012, European Journal of Human Genetics.

[15]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[16]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[17]  D. Kleinjan,et al.  Cis-ruption mechanisms: disruption of cis-regulatory control as a cause of human genetic disease. , 2009, Briefings in functional genomics & proteomics.

[18]  Sandro Banfi,et al.  microRNAs and genetic diseases , 2009, PathoGenetics.

[19]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[20]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[21]  J. Danesh,et al.  Large-scale association analysis identifies new risk loci for coronary artery disease , 2013 .

[22]  D. Bartel,et al.  MicroRNAs Modulate Hematopoietic Lineage Differentiation , 2004, Science.

[23]  K. Howe,et al.  Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. , 2007, Genome research.

[24]  D. Srivastava,et al.  Interaction of Gata4 and Gata6 with Tbx5 is critical for normal cardiac development. , 2009, Developmental biology.

[25]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[26]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[27]  Obi L. Griffith,et al.  ORegAnno: an open-access community-driven resource for regulatory annotation , 2007, Nucleic Acids Res..

[28]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[29]  John F. Peden,et al.  Thirty-five common variants for coronary artery disease: the fruits of much collaborative labour , 2011, Human molecular genetics.

[30]  Jason Y. Liu,et al.  Analysis of protein sequence and interaction data for candidate disease gene prediction , 2006, Nucleic acids research.

[31]  Manuel A. R. Ferreira,et al.  Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. , 2009, American journal of human genetics.

[32]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[33]  P. Quax,et al.  Antagomir-mediated silencing of endothelial cell specific microRNA-126 impairs ischemia-induced angiogenesis , 2008, Journal of cellular and molecular medicine.

[34]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[35]  M. Daly,et al.  Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions , 2009, PLoS genetics.

[36]  M. Oti,et al.  Web tools for the prioritization of candidate disease genes. , 2011, Methods in molecular biology.

[37]  Alan D. Lopez,et al.  Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data , 2006, The Lancet.

[38]  E. Rimm,et al.  Protein Interaction-Based Genome-Wide Analysis of Incident Coronary Heart Disease , 2011, Circulation. Cardiovascular genetics.

[39]  C. Depré,et al.  The role of the ubiquitin-proteasome pathway in cardiovascular disease. , 2010, Cardiovascular research.

[40]  Bart De Moor,et al.  A guide to web tools to prioritize candidate genes , 2011, Briefings Bioinform..

[41]  P. Wenham,et al.  GENETICS OF CORONARY HEART DISEASE , 1989, The Lancet.

[42]  Annick Harel-Bellan,et al.  The microRNA miR-181 targets the homeobox protein Hox-A11 during mammalian myoblast differentiation , 2006, Nature Cell Biology.

[43]  Bing Zhang,et al.  WebGestalt2: an updated and expanded version of the Web-based Gene Set Analysis Toolkit , 2010, BMC Bioinformatics.

[44]  P. Palange,et al.  From the authors , 2007, European Respiratory Journal.

[45]  Sara Ballouz,et al.  Comparison of automated candidate gene prediction systems using genes implicated in type 2 diabetes by genome-wide association studies , 2009, BMC Bioinformatics.

[46]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[47]  Kui Li,et al.  MicroRNA-148a Promotes Myogenic Differentiation by Targeting the ROCK1 Gene* , 2012, The Journal of Biological Chemistry.

[48]  Stefanie Dimmeler,et al.  Circulating MicroRNAs in Patients With Coronary Artery Disease , 2010, Circulation research.

[49]  L. Patthy Modular Assembly of Genes and the Evolution of New Functions , 2003, Genetica.

[50]  David B. Goldstein,et al.  Rare Variants Create Synthetic Genome-Wide Associations , 2010, PLoS biology.

[51]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..