GoD: An R-package based on ontologies for prioritization of genes with respect to diseases

Abstract Omics sciences are widely used to analyze diseases at a molecular level. Usually, results of omics experiments are a large list of candidate genes, proteins or other molecules. The interpretation of results and the filtering of candidate genes or proteins selected in an experiment is a challenge in some scenarios. This problem is particularly evident in clinical scenarios in which researchers are interested in the behaviour of few molecules related to some specific disease. The filtering requires the use of domain-specific knowledge that is often encoded into ontologies. To support this interpretation, we implemented GoD (Gene ranking based on Diseases), an algorithm that ranks a given set of genes based on ontology annotations. The algorithm orders genes by the semantic similarity computed with respect to a disease among the annotations of each gene and those describing the selected disease. We tested GoD as proof-of-principle using: Human Phenotype Ontology (HPO), Gene Ontology (GO) and Disease Ontology (DO) using the semantic similarity measures. GoD is publicly available for academic use at https://sites.google.com/site/geneontologyprioritization/ .

[1]  Yves Moreau,et al.  Integrating Computational Biology and Forward Genetics in Drosophila , 2009, PLoS genetics.

[2]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[3]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[4]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[5]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[6]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[7]  Mario Cannataro,et al.  Semantic similarity analysis of protein data: assessment with biological features and issues , 2012, Briefings Bioinform..

[8]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[9]  Mario Cannataro,et al.  Biases in information content measurement of gene ontology terms , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[10]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[11]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[12]  Qing-Yu He,et al.  DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis , 2015, Bioinform..

[13]  Pietro Hiram Guzzi,et al.  M-Finder: Uncovering functionally associated proteins from interactome data integrated with GO annotations , 2013, Proteome Science.

[14]  Thomas Lengauer,et al.  Improving disease gene prioritization using the semantic similarity of Gene Ontology terms , 2010, Bioinform..

[15]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[16]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[17]  Martin Vingron,et al.  Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration , 2008, Bioinform..

[18]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[19]  Christophe Dessimoz,et al.  The what, where, how and why of gene ontology—a primer for bioinformaticians , 2011, Briefings Bioinform..

[20]  D. Venzon,et al.  Clinical pharmacology and pharmacogenetics in a genomics era: the DMET platform. , 2010, Pharmacogenomics.

[21]  Olivier Sallou,et al.  GPSy: a cross-species gene prioritization system for conserved biological processes—application in male gamete development , 2012, Nucleic Acids Res..

[22]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[23]  Huiru Zheng,et al.  Ontology- and graph-based similarity assessment in biological networks , 2010, Bioinform..

[24]  Yong Chen,et al.  DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases , 2011, BMC Systems Biology.

[25]  Mario Cannataro,et al.  Data mining and life sciences applications on the grid , 2013, WIREs Data Mining Knowl. Discov..

[26]  E. Guney,et al.  Exploiting Protein-Protein Interaction Networks for Genome-Wide Disease-Gene Prioritization , 2012, PloS one.

[27]  Bart De Moor,et al.  A guide to web tools to prioritize candidate genes , 2011, Briefings Bioinform..

[28]  Mario Cannataro,et al.  DMET-Analyzer: automatic analysis of Affymetrix DMET Data , 2012, BMC Bioinformatics.