Disease Gene Prioritization Using Network and Feature

Identifying high-confidence candidate genes that are causative for disease phenotypes, from the large lists of variations produced by high-throughput genomics, can be both time-consuming and costly. The development of novel computational approaches, utilizing existing biological knowledge for the prioritization of such candidate genes, can improve the efficiency and accuracy of the biomedical data analysis. It can also reduce the cost of such studies by avoiding experimental validations of irrelevant candidates. In this study, we address this challenge by proposing a novel gene prioritization approach that ranks promising candidate genes that are likely to be involved in a disease or phenotype under study. This algorithm is based on the modified conditional random field (CRF) model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. We validated our approach on two independent disease benchmark studies by ranking candidate genes using network and feature information. Our results showed both high area under the curve (AUC) value (0.86), and more importantly high partial AUC (pAUC) value (0.1296), and revealed higher accuracy and precision at the top predictions as compared with other well-performed gene prioritization tools, such as Endeavour (AUC-0.82, pAUC-0.083) and PINTA (AUC-0.76, pAUC-0.066). We were able to detect more target genes (9/18/19/27) on top positions (1/5/10/20) compared to Endeavour (3/11/14/23) and PINTA (6/10/13/18). To demonstrate its usability, we applied our method to a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our approach was able to correctly recover genes related to both disorders and provide suggestions for possible additional candidates based on their rankings and functional annotations.

[1]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[2]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[3]  Jennifer K Inlow,et al.  Molecular and comparative genetics of mental retardation. , 2004, Genetics.

[4]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[5]  Jeffrey T. Chang,et al.  GATHER: a systems approach to interpreting genomic signatures , 2006, Bioinform..

[6]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[7]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[8]  Bart De Moor,et al.  Endeavour update: a web resource for gene prioritization in multiple species , 2008, Nucleic Acids Res..

[9]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[10]  H. Ropers,et al.  Genetics of intellectual disability. , 2008, Current opinion in genetics & development.

[11]  Yves Moreau,et al.  Integrating Computational Biology and Forward Genetics in Drosophila , 2009, PLoS genetics.

[12]  Sharmila Banerjee-Basu,et al.  AutDB: a gene reference resource for autism research , 2008, Nucleic Acids Res..

[13]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[14]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[15]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.

[16]  Hans van Bokhoven,et al.  Genetic and epigenetic networks in intellectual disabilities. , 2011, Annual review of genetics.

[17]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[18]  Bart De Moor,et al.  A guide to web tools to prioritize candidate genes , 2011, Briefings Bioinform..

[19]  Yves Moreau,et al.  PINTA: a web server for network-based gene prioritization from expression data , 2011, Nucleic Acids Res..

[20]  R. Stevenson,et al.  Fragile X and X-linked intellectual disability: four decades of discovery. , 2012, American journal of human genetics.

[21]  Ü. Tan Latest Findings in Intellectual and Developmental Disabilities Research , 2012 .

[22]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[23]  Sheng Wang,et al.  Lynx: a database and knowledge extraction engine for integrative medicine , 2013, Nucleic Acids Res..