A semi-automated methodology for finding lipid-related GO terms

Motivation: Although semantic similarity in Gene Ontology (GO) and other approaches may be used to find similar GO terms, there is yet a method to systematically find a class of GO terms sharing a common property with high accuracy (e.g. involving human curation). Results: We have developed a methodology to address this issue and applied it to identify lipid-related GO terms, owing to the important and varied roles of lipids in many biological processes. Our methodology finds lipid-related GO terms in a semi-automated manner, requiring only moderate manual curation. We first obtain a list of lipid-related gold-standard GO terms by keyword search and manual curation. Then, based on the hypothesis that co-annotated GO terms share similar properties, we develop a machine learning method that expands the list of lipid-related terms from the gold standard. Those terms predicted most likely to be lipid related are examined by a human curator following specific curation rules to confirm the class labels. The structure of GO is also exploited to help reduce the curation effort. The prediction and curation cycle is repeated until no further lipid-related term is found. Our approach has covered a high proportion, if not all, of lipid-related terms with relatively high efficiency. Database URL: http://compbio.ddns.comp.nus.edu.sg/∼lipidgo

[1]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[2]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[3]  Robert Stevens,et al.  Gene Ontology Consortium , 2014 .

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  L. Wong,et al.  Enhancing the utility of Proteomics Signature Profiling (PSP) with Pathway Derived Subnets (PDSs), performance analysis and specialised ontologies , 2013, BMC Genomics.

[6]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[7]  Steffen Staab,et al.  Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision , 2002, COLING.

[8]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[9]  Christian Posse,et al.  XOA: Web-Enabled Cross-Ontological Analytics , 2007, 2007 IEEE Congress on Services (Services 2007).

[10]  Limsoon Wong,et al.  LipidGO: database for lipid-related GO terms and applications , 2014, Bioinform..

[11]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[12]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[13]  Christoph Steinbeck,et al.  Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology , 2013, BMC Genomics.

[14]  M. Wenk The emerging field of lipidomics , 2005, Nature Reviews Drug Discovery.

[15]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[16]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[17]  Xinghua Lu,et al.  Identifying informative subsets of the Gene Ontology with information bottleneck methods , 2010, Bioinform..

[18]  H. Chandler Database , 1985 .