Recent approaches to the prioritization of candidate disease genes

Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High‐throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time‐consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow‐up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene–disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods. WIREs Syst Biol Med 2012. doi: 10.1002/wsbm.1177

[1]  Thomas Lengauer,et al.  Computational analysis of human protein interaction networks , 2007, Proteomics.

[2]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[3]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[4]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[5]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[6]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[7]  Yongjin Li,et al.  Discovering disease-genes by topological features in human protein-protein interaction network , 2006, Bioinform..

[8]  K. Gunsalus,et al.  Network modeling links breast cancer susceptibility and centrosome dysfunction. , 2007, Nature genetics.

[9]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[10]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[11]  Yves Moreau,et al.  Large-scale benchmark of Endeavour using MetaCore maps , 2010, Bioinform..

[12]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[13]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[14]  Maricel G. Kann,et al.  Advances in translational bioinformatics: computational approaches for the hunting of disease genes , 2010, Briefings Bioinform..

[15]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[16]  M. Vidal,et al.  Literature-curated protein interaction , 2009 .

[17]  Yong Chen,et al.  DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases , 2011, BMC Systems Biology.

[18]  Michael Boutros,et al.  The art and design of genetic screens: RNA interference , 2008, Nature Reviews Genetics.

[19]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[20]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[21]  P. Radivojac,et al.  An integrated approach to inferring gene–disease associations in humans , 2008, Proteins.

[22]  A. Bulpitt,et al.  Combining the interactome and deleterious SNP predictions to improve disease gene identification , 2009, Human mutation.

[23]  Jean-Philippe Vert,et al.  ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples , 2011, BMC Bioinformatics.

[24]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[25]  G. Hon,et al.  Next-generation genomics: an integrative approach , 2010, Nature Reviews Genetics.

[26]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[27]  Xin Yao,et al.  Modularity-based credible prediction of disease genes and detection of disease subtypes on the phenotype-gene heterogeneous network , 2011, BMC Systems Biology.

[28]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Jagdish Chandra Patra,et al.  Integration of multiple data sources to prioritize candidate genes using discounted rating system , 2010, BMC Bioinformatics.

[30]  Christopher Lawrence,et al.  DEGENERATION , 2020, Side Effects May Include Strangers.

[31]  Carolina Perez-Iratxeta,et al.  Towards completion of the Earth's proteome , 2007, EMBO reports.

[32]  Xiaoli Li,et al.  Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation , 2011, PloS one.

[33]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[34]  Russ B. Altman,et al.  Bioinformatics challenges for personalized medicine , 2011, Bioinform..

[35]  Bing Yu,et al.  In Silico Tools for Gene Discovery , 2011, Methods in Molecular Biology.

[36]  Fidel Ramírez,et al.  Novel search method for the discovery of functional relationships , 2011, Bioinform..

[37]  Qifang Liu,et al.  Align human interactome with phenome to identify causative genes and networks underlying disease families , 2009, Bioinform..

[38]  Roded Sharan,et al.  A Network-Based Method for Predicting Disease-Causing Genes , 2009, J. Comput. Biol..

[39]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[40]  Melissa S. Cline,et al.  Using bioinformatics to predict the functional impact of SNVs , 2011, Bioinform..

[41]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[42]  Yves Moreau,et al.  Network Analysis of Differential Expression for the Identification of Disease-Causing Genes , 2009, PloS one.

[43]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[44]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[45]  Thomas Lengauer,et al.  Improving disease gene prioritization using the semantic similarity of Gene Ontology terms , 2010, Bioinform..

[46]  Xingli Guo,et al.  A Computational Method Based on the Integration of Heterogeneous Networks for Predicting Disease-Gene Associations , 2011, PloS one.

[47]  N. Risch,et al.  Linkage strategies for genetically complex traits. III. The effect of marker polymorphism on analysis of affected relative pairs. , 1990, American journal of human genetics.

[48]  Bart De Moor,et al.  A guide to web tools to prioritize candidate genes , 2011, Briefings Bioinform..

[49]  P.A.C.R. Costa,et al.  A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data , 2010, BMC Genomics.

[50]  Haiyuan Yu,et al.  Network-based methods for human disease gene prediction. , 2011, Briefings in functional genomics.

[51]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[52]  E. Stone,et al.  The genetics of quantitative traits: challenges and prospects , 2009, Nature Reviews Genetics.

[53]  Mehmet Koyutürk,et al.  DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization , 2011, BioData Mining.

[54]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[55]  M. Oti,et al.  The modular nature of genetic diseases , 2006, Clinical genetics.

[56]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[57]  Mihai Pop,et al.  Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information , 2009, J. Comput. Biol..

[58]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[59]  Mauno Vihinen,et al.  Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies , 2008, Nucleic acids research.

[60]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[61]  Shouguo Gao,et al.  Predicting Type 1 Diabetes Candidate Genes using Human Protein-Protein Interaction Networks , 2009, Journal of computer science and systems biology.

[62]  Susumu Goto,et al.  The commonality of protein interaction networks determined in neurodegenerative disorders (NDDs) , 2007, Bioinform..

[63]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[64]  Yingyao Zhou,et al.  In Silico Gene Prioritization by Integrating Multiple Data Sources , 2011, PloS one.

[65]  Tao Jiang,et al.  Uncover disease genes by maximizing information flow in the phenome–interactome network , 2011, Bioinform..

[66]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[67]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[68]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[69]  Maricel G. Kann,et al.  Protein interactions and disease: computational approaches to uncover the etiology of diseases , 2007, Briefings Bioinform..

[70]  L. Langman,et al.  The challenges of personalized medicine. , 2012, Clinical biochemistry.

[71]  Manuel A. R. Ferreira,et al.  Meta‐analysis of heterogeneous data sources for genome‐scale identification of risk genes in complex phenotypes , 2011, Genetic epidemiology.

[72]  D. Vitkup,et al.  Network properties of genes harboring inherited disease mutations , 2008, Proceedings of the National Academy of Sciences.

[73]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[74]  Jeremy Miller,et al.  Identifying disease-specific genes based on their topological significance in protein networks , 2009, BMC Syst. Biol..

[75]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[76]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[77]  Bart De Moor,et al.  Endeavour update: a web resource for gene prioritization in multiple species , 2008, Nucleic Acids Res..

[78]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[79]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[80]  R. Piro,et al.  Computational approaches to disease‐gene prediction: rationale, classification and successes , 2012, The FEBS journal.

[81]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[82]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[83]  B. Snel,et al.  Predicting disease genes using protein–protein interactions , 2006, Journal of Medical Genetics.

[84]  Joel N Hirschhorn,et al.  Genome-wide association studies: results from the first few years and potential implications for clinical medicine. , 2011, Annual review of medicine.

[85]  A. Barabasi,et al.  A Protein–Protein Interaction Network for Human Inherited Ataxias and Disorders of Purkinje Cell Degeneration , 2006, Cell.

[86]  Teresa M. Przytycka,et al.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases , 2011, PLoS Comput. Biol..

[87]  Yonina C. Eldar,et al.  eQED: an efficient method for interpreting eQTL associations using protein networks , 2008, Molecular systems biology.

[88]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[89]  D. Strachan,et al.  Sequence variants in IL10, ARPC2 and multiple other loci contribute to ulcerative colitis susceptibility , 2008, Nature Genetics.

[90]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[91]  Mario Albrecht,et al.  Genetics of Crohn disease, an archetypal inflammatory barrier disease , 2005, Nature Reviews Genetics.

[92]  Ting Chen,et al.  Further understanding human disease genes by comparing with housekeeping genes and other genes , 2006, BMC Genomics.

[93]  TaeHyun Hwang,et al.  Inferring disease and gene set associations with rank coherence in networks , 2011, Bioinform..

[94]  Alfonso Valencia,et al.  Translational disease interpretation with molecular networks , 2009, Genome Biology.

[95]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[96]  David Valle,et al.  Human disease genes , 2001, Nature.

[97]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[98]  Carolina Perez-Iratxeta,et al.  Linking genes to diseases: it's all in the data , 2009, Genome Medicine.

[99]  E. Sonnhammer,et al.  Genomic gene clustering analysis of pathways in eukaryotes. , 2003, Genome research.

[100]  S. Raychaudhuri Mapping Rare and Common Causal Alleles for Complex Human Diseases , 2011, Cell.

[101]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[102]  A. Liekens,et al.  BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation , 2011, Genome Biology.

[103]  Thomas Lengauer,et al.  Recruitment and activation of a lipid kinase by hepatitis C virus NS5A is essential for integrity of the membranous replication compartment. , 2011, Cell host & microbe.

[104]  Søren Brunak,et al.  Huntingtin-interacting protein 14 is a type 1 diabetes candidate protein regulating insulin secretion and β-cell apoptosis , 2011, Proceedings of the National Academy of Sciences.

[105]  James T. L. Mah,et al.  In silico SNP analysis and bioinformatics tools: a review of the state of the art to aid drug discovery. , 2011, Drug discovery today.

[106]  Bart De Moor,et al.  Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining , 2008, ECCB.

[107]  Rui Jiang,et al.  Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach , 2011, BMC Bioinformatics.

[108]  E. Snitkin,et al.  Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network , 2009, Genome Biology.

[109]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[110]  Mark E. J. Newman A measure of betweenness centrality based on random walks , 2005, Soc. Networks.

[111]  K. Frazer,et al.  Human genetic variation and its contribution to complex traits , 2009, Nature Reviews Genetics.

[112]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[113]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[114]  T. Gilliam,et al.  Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[115]  Tijl De Bie,et al.  Kernel-based data fusion for gene prioritization , 2007, ISMB/ECCB.

[116]  Gary D Bader,et al.  PSICQUIC and PSISCORE: accessing and scoring molecular interactions , 2011, Nature Methods.

[117]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[118]  Bart De Moor,et al.  Candidate gene prioritization by network analysis of differential expression using machine learning approaches , 2010, BMC Bioinformatics.

[119]  J. Dudley,et al.  Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. , 2011, Trends in genetics : TIG.