Computational approaches to disease‐gene prediction: rationale, classification and successes

The identification of genes involved in human hereditary diseases often requires the time‐consuming and expensive examination of a great number of possible candidate genes, since genome‐wide techniques such as linkage analysis and association studies frequently select many hundreds of ‘positional’ candidates. Even considering the positive impact of next‐generation sequencing technologies, the prioritization of candidate genes may be an important step for disease‐gene identification. In this paper we develop a basic classification scheme for computational approaches to disease‐gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.

[1]  E S Lander,et al.  Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. , 1987, Science.

[2]  A Ballabio,et al.  Alport syndrome, mental retardation, midface hypoplasia, and elliptocytosis: a new X linked contiguous gene deletion syndrome? , 1998, Journal of medical genetics.

[3]  D. Botstein,et al.  Exploring the new world of the genome with DNA microarrays , 1999, Nature Genetics.

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R P Cox,et al.  Identification of the alpha-aminoadipic semialdehyde synthase gene, which is defective in familial hyperlysinemia. , 2000, American journal of human genetics.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  J. Nadeau,et al.  Finding Genes That Underlie Complex Traits , 2002, Science.

[8]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[9]  A. Chapelle,et al.  Thirty distinct CACNA1F mutations in 33 families with incomplete type of XLCSNB and Cacna1f expression profiling in mouse retina , 2002, European Journal of Human Genetics.

[10]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[11]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[12]  R. Myers,et al.  Candidate-gene approaches for studying complex genetic traits: practical considerations , 2002, Nature Reviews Genetics.

[13]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[14]  Eric S. Lander,et al.  Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  C. V. Jongeneel,et al.  eVOC: a controlled vocabulary for unifying gene expression data. , 2003, Genome research.

[16]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[17]  Miikka Vikkula,et al.  Capillary malformation-arteriovenous malformation, a new clinical and genetic disorder caused by RASA1 mutations. , 2003, American journal of human genetics.

[18]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[19]  P. Kemmeren,et al.  A new web-based data mining tool for the identification of candidate genes for human genetic disorders , 2003, European Journal of Human Genetics.

[20]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[21]  S. Bergmann,et al.  Similarities and Differences in Genome-Wide Expression Data of Six Organisms , 2003, PLoS biology.

[22]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[23]  H. Brunner,et al.  From syndrome families to functional genomics , 2004, Nature Reviews Genetics.

[24]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[25]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[26]  A. Latos-Bieleńska,et al.  Novel amino acid substitution in the Y‐position of collagen type II causes spondyloepimetaphyseal dysplasia congenita , 2005, American journal of medical genetics. Part A.

[27]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[28]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[29]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[30]  Eugene V Koonin,et al.  Evolutionary significance of gene expression divergence. , 2005, Gene.

[31]  D. Horn,et al.  Severely incapacitating mutations in patients with extreme short stature identify RNA-processing endoribonuclease RMRP as an essential cell growth regulator. , 2005, American journal of human genetics.

[32]  Marc A van Driel,et al.  Bioinformatics methods for identifying candidate disease genes , 2006, Human Genomics.

[33]  Gert Vriend,et al.  GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases , 2005, Nucleic Acids Res..

[34]  Francesco Pinciroli,et al.  GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists , 2005, Nucleic Acids Res..

[35]  Sarah Calvo,et al.  Systematic identification of human mitochondrial disease genes through integrative genomics , 2006, Nature Genetics.

[36]  Jason Y. Liu,et al.  Analysis of protein sequence and interaction data for candidate disease gene prediction , 2006, Nucleic acids research.

[37]  Luca Benini,et al.  TOM: a web-based integrated approach for identification of candidate disease genes , 2006, Nucleic Acids Res..

[38]  Gopal R. Gopinath,et al.  Reactome: a knowledge base of biologic pathways and processes , 2007, Genome Biology.

[39]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[40]  Ronald G. Crystal,et al.  Genetic medicines: treatment strategies for hereditary disorders , 2006, Nature Reviews Genetics.

[41]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[42]  Frances S. Turner,et al.  Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes , 2006, Nucleic acids research.

[43]  Tianzi Jiang,et al.  Exploring candidate genes for human brain diseases from a brain-specific gene network. , 2006, Biochemical and biophysical research communications.

[44]  B. Snel,et al.  Predicting disease genes using protein–protein interactions , 2006, Journal of Medical Genetics.

[45]  Sarah Calvo,et al.  MPV17 encodes an inner mitochondrial membrane protein and is mutated in infantile hepatic mitochondrial DNA depletion , 2006, Nature Genetics.

[46]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[47]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[48]  Update of the G2D tool for prioritization of gene candidates to inherited diseases , 2007, Nucleic Acids Res..

[49]  J. Venables,et al.  Downstream intronic splicing enhancers , 2007, FEBS letters.

[50]  M. Oti,et al.  The modular nature of genetic diseases , 2006, Clinical genetics.

[51]  Karen L. Mohlke,et al.  Data and text mining A computational system to select candidate genes for complex human traits , 2007 .

[52]  Allan R. Jones,et al.  Genome-wide atlas of gene expression in the adult mouse brain , 2007, Nature.

[53]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[54]  Carolina Perez-Iratxeta,et al.  Towards completion of the Earth's proteome , 2007, EMBO reports.

[55]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[56]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[57]  Dragomir R. Radev,et al.  Identifying gene-disease associations using centrality on a literature mined gene-interaction network , 2008, ISMB.

[58]  Bart De Moor,et al.  Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining , 2008, ECCB.

[59]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[60]  J. A. Lozano,et al.  Prioritization of candidate cancer genes—an aid to oncogenomic studies , 2008, Nucleic acids research.

[61]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[62]  P. Radivojac,et al.  An integrated approach to inferring gene–disease associations in humans , 2008, Proteins.

[63]  Bart De Moor,et al.  Endeavour update: a web resource for gene prioritization in multiple species , 2008, Nucleic Acids Res..

[64]  Rosario M. Piro,et al.  Prediction of Human Disease Genes by Human-Mouse Conserved Coexpression Analysis , 2008, PLoS Comput. Biol..

[65]  P. Provero,et al.  Functional Annotation and Identification of Candidate Disease Genes by Computational Analysis of Normal Tissue Gene Expression Data , 2008, PloS one.

[66]  M. Huynen,et al.  Phenome connections. , 2008, Trends in genetics : TIG.

[67]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[68]  You-Qiang Song,et al.  Prediction of osteoporosis candidate genes by computational disease-gene identification strategy , 2008, Journal of Human Genetics.

[69]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[70]  Mathieu Lemire,et al.  Genes to Diseases (G2D) Computational Method to Identify Asthma Candidate Genes , 2008, PloS one.

[71]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[72]  E. Snitkin,et al.  Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network , 2009, Genome Biology.

[73]  Allan R. Jones,et al.  The Allen Brain Atlas: 5 years and beyond , 2009, Nature Reviews Neuroscience.

[74]  J. Gécz,et al.  Lessons learnt from large-scale exon re-sequencing of the X chromosome , 2009, Human molecular genetics.

[75]  Andrew Menzies,et al.  A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation , 2009, Nature Genetics.

[76]  W. Kibbe,et al.  Annotating the human genome with Disease Ontology , 2009, BMC Genomics.

[77]  Yves Moreau,et al.  Network Analysis of Differential Expression for the Identification of Disease-Causing Genes , 2009, PloS one.

[78]  Chiranjib Bhattacharyya,et al.  Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[79]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[80]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[81]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[82]  K. Frazer,et al.  Human genetic variation and its contribution to complex traits , 2009, Nature Reviews Genetics.

[83]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[84]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[85]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[86]  Carolina Perez-Iratxeta,et al.  Linking genes to diseases: it's all in the data , 2009, Genome Medicine.

[87]  A. J. Walhout,et al.  Gene-centered regulatory networks. , 2010, Briefings in functional genomics.

[88]  Maricel G. Kann,et al.  Advances in translational bioinformatics: computational approaches for the hunting of disease genes , 2010, Briefings Bioinform..

[89]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[90]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[91]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[92]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[93]  Yadong Wang,et al.  Prioritization of disease microRNAs through a human phenome-microRNAome network , 2010, BMC Systems Biology.

[94]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[95]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[96]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[97]  Avitan Gefen,et al.  Syndrome to gene (S2G): in‐silico identification of candidate genes for human diseases , 2010, Human mutation.

[98]  Rosario M. Piro,et al.  Candidate gene prioritization based on spatially mapped gene expression: an application to XLMR , 2010, Bioinform..

[99]  J. Gécz,et al.  Mutations in the small GTPase gene RAB39B are responsible for X-linked mental retardation associated with autism, epilepsy, and macrocephaly. , 2010, American journal of human genetics.

[100]  H. Hakonarson,et al.  Using VAAST to identify an X-linked disorder resulting in lethality in male infants due to N-terminal acetyltransferase deficiency. , 2011, American journal of human genetics.

[101]  Wei Chen,et al.  Deep sequencing reveals 50 novel genes for recessive cognitive disorders , 2011, Nature.

[102]  Emmanouil Collab A map of human genome variation from population-scale sequencing , 2011, Nature.

[103]  M. G. Reese,et al.  A probabilistic disease-gene finder for personal genomes. , 2011, Genome research.

[104]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[105]  H. Deng,et al.  Identification of genes for bone mineral density variation by computational disease gene identification strategy , 2011, Journal of Bone and Mineral Metabolism.

[106]  Ivan Molineris,et al.  An atlas of tissue-specific conserved coexpression for functional annotation and disease gene prediction , 2011, European Journal of Human Genetics.

[107]  R. Piro,et al.  Evaluation of Candidate Genes from Orphan FEB and GEFS+ Loci by Analysis of Human Brain Gene Expression Atlases , 2011, PloS one.