Identifying and relating biological concepts in the Catalogue of Life

BackgroundIn this paper we describe our experience of adding globally unique identifiers to the Species 2000 and ITIS Catalogue of Life, an on-line index of organisms which is intended, ultimately, to cover all the world's known species. The scientific species names held in the Catalogue are names that already play an extensive role as terms in the organisation of information about living organisms in bioinformatics and other domains, but the effectiveness of their use is hindered by variation in individuals' opinions and understanding of these terms; indeed, in some cases more than one name will have been used to refer to the same organism. This means that it is desirable to be able to give unique labels to each of these differing concepts within the catalogue and to be able to determine which concepts are being used in other systems, in order that they can be associated with the concepts in the catalogue. Not only is this needed, but it is also necessary to know the relationships between alternative concepts that scientists might have employed, as these determine what can be inferred when data associated with related concepts is being processed. A further complication is that the catalogue itself is evolving as scientific opinion changes due to an increasing understanding of life.ResultsWe describe how we are using Life Science Identifiers (LSIDs) as globally unique identifiers in the Catalogue of Life, explaining how the mapping to species concepts is performed, how concepts are associated with specific editions of the catalogue, and how the Taxon Concept Schema has been adopted in order to express information about concepts and their relationships. We explore the implications of using globally unique identifiers in order to refer to abstract concepts such as species, which incorporate at least a measure of subjectivity in their definition, in contrast with the more traditional use of such identifiers to refer to more tangible entities, events, documents, observations, etc.ConclusionsA major reason for adopting identifiers such as LSIDs is to facilitate data integration. We have demonstrated the incorporation of LSIDs into the Catalogue of Life, in a manner consistent with the biodiversity informatics community's conventions for LSID use. The Catalogue of Life is therefore available as a taxonomy of organisms for use within various disciplines, including biomedical research, by software written with an awareness of these conventions.

[1]  R. J. White,et al.  SPICE: A Flexible Architecture for Integrating Autonomous Databases to Comprise a Distributed Catalogue of Life , 2000, DEXA.

[2]  Sean Martin,et al.  The impact of Life Science Identifier on informatics data. , 2005, Drug discovery today.

[3]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[4]  Q. Wheeler The New Taxonomy , 2008 .

[5]  Roderic D. M. Page,et al.  Biodiversity informatics: the challenge of linking data and the role of shared identifiers , 2008, Briefings Bioinform..

[6]  Carolyn M. Hall,et al.  Encyclopedia of Library and Information Science , 1971 .

[7]  Catherine N. Norton,et al.  Taxonomic indexing--extending the role of taxonomy. , 2006, Systematic biology.

[8]  F. Bisby,et al.  Species 2000 & ITIS Catalogue of Life , 2010 .

[9]  Phil Cryer Adoption of Persistent Identifiers for Biodiversity Informatics , 2010 .

[10]  Trevor Paterson,et al.  Scientific Names Are Ambiguous as Identifiers for Biological Taxa: Their Context and Definition Are Required for Accurate Data Integration , 2005, DILS.

[11]  Sean Martin,et al.  Globally distributed object identification for biological knowledgebases , 2004, Briefings Bioinform..

[12]  Marcia J. Bates,et al.  Encyclopedia of Library and Information Sciences , 2009 .

[13]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[14]  Rich Salz,et al.  A Universally Unique IDentifier (UUID) URN Namespace , 2005, RFC.

[15]  M. Lane The Global Biodiversity Information Facility , 2005 .

[16]  Daniel P. Miranker,et al.  Schema Driven Assignment and Implementation of Life Science Identifiers (lsids) , 2006 .

[17]  Micah Altman,et al.  A Proposed Standard for the Scholarly Citation of Quantitative Data , 2008 .

[18]  N. Paskin Digital Object Identifier (DOI) System , 2010 .

[19]  J. Edwards,et al.  The Global Biodiversity Information Facility (GBIF) , 2007 .

[20]  Joe Celko,et al.  Mathematical Biology , 2004 .

[21]  Nicolas Le Novère,et al.  MIRIAM Resources: tools to generate and resolve robust cross-references in Systems Biology , 2007, BMC Systems Biology.

[22]  Roderic D. M. Page,et al.  LSID Tester, a tool for testing Life Science Identifier resolution services , 2008, Source Code for Biology and Medicine.

[23]  Indra Neil Sarkar,et al.  Biodiversity informatics: organizing and linking information across the spectrum of life , 2007, Briefings Bioinform..

[24]  Nico M. Franz,et al.  5 On the Use of Taxonomic Concepts in Support of Biodiversity Research and Taxonomy , 2006 .

[25]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[26]  Micah Altman,et al.  A Proposed Standard for the Scholarly Citation of Quantitative Data , 2008, IASSIST Conference.

[27]  Jennifer Golbeck,et al.  Ontologies for ecoinformatics , 2006, J. Web Semant..