HPOSim: An R Package for Phenotypic Similarity Measure and Enrichment Analysis Based on the Human Phenotype Ontology

Background Phenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO. Results HPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA). Conclusions HPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/).

[1]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[2]  James Zijun Wang,et al.  Effectively Integrating Information Content and Structural Relationship to Improve the GO-based Similarity Measure Between Proteins , 2010, BIOCOMP.

[3]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[4]  S. Danforth,et al.  The Bridge Project , 1997, Journal of learning disabilities.

[5]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[6]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[7]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[8]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[9]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[10]  J. Voorhees,et al.  Pathophysiology of premature skin aging induced by ultraviolet light. , 1997, The New England journal of medicine.

[11]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[12]  Michel Dumontier,et al.  Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics , 2012, Bioinform..

[13]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[14]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[15]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[16]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[17]  A. Larbi,et al.  Cytokine receptor signalling and aging , 2006, Mechanisms of Ageing and Development.

[18]  Xiang Li,et al.  DOSim: An R package for similarity between diseases based on Disease Ontology , 2011, BMC Bioinformatics.

[19]  Manuel Corpas,et al.  DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. , 2009, American journal of human genetics.

[20]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[21]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[22]  T. Clemens,et al.  Abnormalities in parathyroid hormone secretion and 1,25-dihydroxyvitamin D3 formation in women with osteoporosis. , 1989, The New England journal of medicine.

[23]  Erhard Rahm,et al.  FUNC: a package for detecting significant associations between gene sets and ontological annotations , 2007, BMC Bioinformatics.

[24]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  A. Jemal,et al.  Cancer statistics, 2013 , 2013, CA: a cancer journal for clinicians.

[27]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[28]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[29]  Holger Fröhlich,et al.  GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products , 2007, BMC Bioinformatics.

[30]  M. Oti,et al.  The modular nature of genetic diseases , 2006, Clinical genetics.

[31]  Shi-Hua Zhang,et al.  Disease-Aging Network Reveals Significant Roles of Aging Genes in Connecting Genetic Diseases , 2009, PLoS Comput. Biol..

[32]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[33]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[34]  B. Scheithauer,et al.  Aging and the human pituitary gland. , 1993, Mayo Clinic proceedings.

[35]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[36]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[37]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[38]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[39]  Damian Smedley,et al.  PhenoDigm: analyzing curated annotations to associate animal models with human diseases , 2013, Database J. Biol. Databases Curation.

[40]  Xiang-Sun Zhang,et al.  NOA: a novel Network Ontology Analysis method , 2011, Nucleic acids research.

[41]  Nicole de Leeuw,et al.  An update on ECARUCA, the European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations. , 2013, European journal of medical genetics.

[42]  Sean D. Mooney,et al.  STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation , 2013, BMC Bioinformatics.