Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies

The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.

[1]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[2]  Walter G. Berendsohn,et al.  The ABCD of primary biodiversity data access , 2012 .

[3]  Brian L. Fisher,et al.  A Revision of Malagasy Species of Anochetus Mayr and Odontomachus Latreille (Hymenoptera: Formicidae) , 2008, PloS one.

[4]  Reed Beaman,et al.  Clarifying Concepts and Terms in Biodiversity Informatics , 2013, Standards in genomic sciences.

[5]  José L. V. Mejino,et al.  CARO - The Common Anatomy Reference Ontology , 2008, Anatomy Ontologies for Bioinformatics.

[6]  D J Patterson,et al.  Names are key to the big new biology. , 2010, Trends in ecology & evolution.

[7]  Robert Hoehndorf,et al.  The neurobehavior ontology: an ontology for annotation and integration of behavior and behavioral phenotypes. , 2012, International review of neurobiology.

[8]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[9]  David Robinson,et al.  Research resources: curating the new eagle-i discovery system , 2012, Database J. Biol. Databases Curation.

[10]  R. Peet,et al.  Perspectives: Towards a language for mapping relationships among taxonomic concepts , 2009 .

[11]  G. Cochrane,et al.  The Genomic Standards Consortium , 2011, PLoS biology.

[12]  Karen Eilbeck,et al.  Efforts toward a More Consistent and Interoperable Sequence Ontology , 2012, ICBO.

[13]  Renaud Fortuner,et al.  Advances in Computer Methods for Systematic Biology: Artificial Intelligence, Databases, Computer Vision , 1993 .

[14]  A. Cropper Convention on Biological Diversity , 1993, Environmental Conservation.

[15]  Barry Smith,et al.  Infectious Disease Ontology , 2010 .

[16]  Steven J. Baskauf ORGANIZATION OF BIODIVERSITY RESOURCES BASED ON THE PROCESS OF THEIR CREATION AND THE ROLE OF INDIVIDUAL ORGANISMS AS RESOURCE RELATIONSHIP NODES , 2010 .

[17]  Nico Cellinese,et al.  Evolutionary informatics: unifying knowledge about the diversity of life. , 2012, Trends in ecology & evolution.

[18]  Renzo Kottmann,et al.  Meeting Report: Hackathon-Workshop on Darwin Core and MIxS Standards Alignment (February 2012) , 2012, Standards in genomic sciences.

[19]  John C. Wooley,et al.  Extending Standards for Genomics and Metagenomics Data: A Research Coordination Network for the Genomic Standards Consortium (RCN4GSC) , 2009, Standards in genomic sciences.

[20]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[21]  Robert Stevens,et al.  The Manchester OWL Syntax , 2006, OWLED.

[22]  Mark Schildhauer,et al.  Cyberinfrastructure for an integrated botanical information network to investigate the ecological impacts of global climate change on plant biodiversity , 2016 .

[23]  N. Pettorelli,et al.  Essential Biodiversity Variables , 2013, Science.

[24]  Arlin Stoltzfus,et al.  Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis , 2012, BMC Research Notes.

[25]  D. Field,et al.  Sequencing data: A genomic network to monitor Earth , 2012, Nature.

[26]  Werner Ceusters,et al.  Negative findings in electronic health records and biomedical ontologies: A realist approach , 2007, Int. J. Medical Informatics.

[27]  Erika Check,et al.  Treasure island: pinning down a model ecosystem , 2006, Nature.

[28]  Chris Mungall,et al.  The Environment Ontology – Linking Environmental Data , 2009 .

[29]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[30]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[31]  Robert Arp,et al.  Function, Role and Disposition in Basic Formal Ontology , 2008 .

[32]  Steve Kelling,et al.  Data-Intensive Science: A New Paradigm for Biodiversity Studies , 2009 .

[33]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[34]  Henry Shaw,et al.  Missouri Botanical Garden , 2012 .

[35]  K. Gaston,et al.  A national scale inventory of resource provision for biodiversity within domestic gardens , 2009 .

[36]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[37]  STEVEN J. BASKAUF ORGANIZATION OF OCCURRENCE-RELATED BIODIVERSITY RESOURCES BASED ON THE PROCESS OF THEIR CREATION AND THE ROLE OF INDIVIDUAL ORGANISMS AS RESOURCE RELATIONSHIP NODES , 2010 .

[38]  J. Balhoff,et al.  Time to change how we describe biodiversity. , 2012, Trends in ecology & evolution.

[39]  Brian D. Greene,et al.  Five new species of the damselfish genus Chromis (Perciformes: Labroidei: Pomacentridae) from deep coral reefs in the tropical western Pacific , 2008 .

[40]  Gregor Hagedorn,et al.  Developing a core ontology for taxonomic data. , 2006 .

[41]  W. John Kress,et al.  Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples , 2010, ZooKeys.

[42]  Walter G. Berendsohn,et al.  The concept of "potential taxa" in databases , 1995 .

[43]  Renzo Kottmann,et al.  RCN4GSC Workshop Report: Managing Data at the Interface of Biodiversity and (Meta)Genomics, March 2011 , 2012, Standards in genomic sciences.

[44]  Renzo Kottmann,et al.  A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). , 2008, Omics : a journal of integrative biology.

[45]  Suzanna E. Lewis,et al.  Sequence Ontology Annotation Guide , 2004, Comparative and functional genomics.

[46]  Shawn Bowers,et al.  Advancing ecological research with ontologies. , 2008, Trends in ecology & evolution.

[47]  Alex Hardisty,et al.  UvA-DARE ( Digital Academic Repository ) A decadal view of biodiversity informatics : challenges and priorities , 2013 .

[48]  Steven J. Baskauf,et al.  Darwin-SW: Darwin Core data for the Semantic Web , 2011 .

[49]  Philippe Rocca-Serra,et al.  A call for an international network of genomic observatories (GOs) , 2012, GigaScience.

[50]  Eren Turak,et al.  Building a global observing system for biodiversity , 2012 .

[51]  S. Higgins,et al.  TRY – a global database of plant traits , 2011, Global Change Biology.

[52]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[53]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[54]  Alberto Jiménez-Valverde,et al.  The uncertain nature of absences and their importance in species distribution modelling , 2010 .

[55]  H. D. Cooper,et al.  Scenarios for Global Biodiversity in the 21st Century , 2010, Science.

[56]  Norman Morrison,et al.  Organizing our knowledge of biodiversity , 2011 .

[57]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[58]  Lynne A. Goodwin,et al.  Complete genome sequence of Anaerococcus prevotii type strain (PC1T) , 2009, Standards in genomic sciences.

[59]  Nico M. Franz,et al.  BIOLOGICAL TAXONOMY AND ONTOLOGY DEVELOPMENT: SCOPE AND LIMITATIONS , 2010 .

[60]  Amarnath Gupta,et al.  Development and use of Ontologies Inside the Neuroscience Information Framework: A Practical Approach , 2012, Front. Gene..

[61]  G. Daily,et al.  Biodiversity loss and its impact on humanity , 2012, Nature.

[62]  Barry Smith,et al.  The environment ontology: contextualising biological and biomedical entities , 2013, Journal of Biomedical Semantics.

[63]  Walter Jetz,et al.  Integrating biodiversity distribution knowledge: toward a global map of life. , 2012, Trends in ecology & evolution.

[64]  Barry Smith,et al.  SNAP and SPAN: Towards Dynamic Spatial Ontology , 2004, Spatial Cogn. Comput..