Ontologies in Bioinformatics

Molecular biology offers a large, complex and volatile domain that tests knowledge representation techniques to the limit of their fidelity, precision, expressivity and adaptability. The discipline of molecular biology and bioinformatics relies greatly on the use of community knowledge, rather than laws and axioms, to further understanding, and knowledge generation. This knowledge has traditionally been kept as natural language. Given the exponential growth of already large quantities of data and associated knowledge, this is an unsustainable form of representation. This knowledge needs to be stored in a computationally amenable form and ontologies offer a mechanism for creating a shared understanding of a community for both humans and computers. Ontologies have been built and used for many domains and this chapter explores their role within bioinformatics. Structured classifications have a long history in biology; not least in the Linnean description of species. The explicit use of ontologies, however, is more recent. This chapter provides a survey of the need for ontologies; the nature of the domain and the knowledge tasks involved; and then an overview of ontology work in the discipline. The widest use of ontologies within biology is for conceptual annotation — a representation of stored knowledge more computationally amenable than natural language. An ontology also offers a means to create the illusion of a common query interface over diverse, distributed information sources — here an ontology creates a shared understanding for the user and also a means to computationally reconcile heterogeneities between the resources. Ontologies also provide a means for a schema definition suitable for the complexity and precision required for biology’s knowledge bases. Coming right up to date, bioinformatics is well set as an exemplar of the Semantic Web, offering both web accessible content and services conceptually marked up as a means for computational exploitation of its resources — this theme is explored through the myGRID services ontology. Ontologies in bioinformatics cover a wide range of usages and representation styles. Bioinformatics offers an exciting application area in which the community can see a real need for ontology based technology to work and deliver its promise.

[1]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[2]  Calton Pu,et al.  Querying multiple bioinformatics information sources: can semantic web research help? , 2002, SGMD.

[3]  Jerry R. Hobbs,et al.  DAML-S: Semantic Markup for Web Services , 2001, SWWS.

[4]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[5]  Alan L. Rector,et al.  Validating clinical terminology structures: integration and cross-validation of Read Thesaurus and GALEN , 1998, AMIA.

[6]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[7]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[8]  Peter D. Karp,et al.  A Strategy for Database Interoperation , 1995, J. Comput. Biol..

[9]  Peter D. Karp,et al.  Integrated Access to Metabolic and Genomic Data , 1996, J. Comput. Biol..

[10]  Limsoon Wong,et al.  A Data Transformation System for Biological Data Sources , 1995, VLDB.

[11]  Ray Paton,et al.  Addressing Biological Complexity to Enable Knowledge Sharing , 1998 .

[12]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[13]  Peter D. Karp,et al.  The EcoCyc and MetaCyc databases , 2000, Nucleic Acids Res..

[14]  Bertram Ludäscher,et al.  Model-based mediation with domain maps , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  E. Webb,et al.  Enzyme nomenclature 1984 : recommendations of the Nomenclature Committee of the International Union of Biochemistry on the nomenclature and classification of enzyme-catalysed reactions , 1984 .

[16]  Russ B. Altman,et al.  RiboWeb: An Ontology-Based System for Collaborative Molecular Biology , 1999, IEEE Intell. Syst..

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[18]  Carole A. Goble,et al.  Transparent access to multiple bioinformatics information sources , 2001, IBM Syst. J..

[19]  Russ B. Altman,et al.  Representing genetic sequence data for pharmacogenomics: an evolutionary approach using ontological and relational models , 2002, ISMB.

[20]  Suzanne M. Paley,et al.  Integrated pathway/genome databases and their role in drug discovery , 1999 .

[21]  Carole A. Goble,et al.  An ontology for bioinformatics applications , 1999, Bioinform..

[22]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[23]  Adam C. Siepel,et al.  An integration platform for heterogeneous bioinformatics software components , 2001, IBM Syst. J..

[24]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[25]  Carole D. Hafner,et al.  Representing Scientific Experiments: Implications for Ontology Design and Knowledge Sharing , 1998, AAAI/IAAI.

[26]  Ian Horrocks,et al.  The GRAIL concept modelling language for medical terminology , 1997, Artif. Intell. Medicine.

[27]  Carole A. Goble,et al.  A Suite of Daml+Oil Ontologies to Describe Bioinformatics Web Services and Data , 2003, Int. J. Cooperative Inf. Syst..

[28]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[29]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[30]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..

[31]  I-Min A Chen,et al.  An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools , 1995, Inf. Syst..

[32]  Ian Horrocks DAML+OIL: A Reason-able Web Ontology Language , 2002, EDBT.

[33]  Teresa K. Attwood,et al.  Introduction to Bioinformatics , 2001 .

[34]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[35]  Carole A. Goble,et al.  A Methodology to Migrate the Gene Ontology to a Description Logic Environment Using DAML+OIL , 2002, Pacific Symposium on Biocomputing.

[36]  Emmanuel Barillot,et al.  DBcat: a catalog of 500 biological databases , 2000, Nucleic Acids Res..

[37]  M. Riley,et al.  Functions of the gene products of Escherichia coli , 1993, Microbiological reviews.

[38]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[39]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..