Biodiversity informatics: the challenge of linking data and the role of shared identifiers

A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.

[1]  L. Stein Integrating biological databases , 2003, Nature Reviews Genetics.

[2]  Ann Apps,et al.  Why OpenURL? , 2006, D Lib Mag..

[3]  Leo Sauermann,et al.  Cool URIs for the semantic web , 2007 .

[4]  Indra Neil Sarkar,et al.  Biodiversity informatics: organizing and linking information across the spectrum of life , 2007, Briefings Bioinform..

[5]  C. E. Powell,et al.  Authors of plant names , 1992 .

[6]  Michael D. Crisp,et al.  Molecular Evidence for Definition of Genera in the Oxylobium Group (Fabaceae: Mirbelieae) , 2003 .

[7]  Catherine N. Norton,et al.  Taxonomic indexing--extending the role of taxonomy. , 2006, Systematic biology.

[8]  Brian L. Fisher,et al.  Evaluating alternative hypotheses for the early evolution and diversification of ants , 2006, Proceedings of the National Academy of Sciences.

[9]  C. Moreau,et al.  Phylogeny of the Ants: Diversification in the Age of Angiosperms , 2006, Science.

[10]  Memoirs of the American Entomological Institute , 2002 .

[11]  Roderic D. M. Page Taxonomic names, metadata, and the Semantic Web , 2006 .

[12]  B. Fisher,et al.  Dracula ant phylogeny as inferred by nuclear 28S rDNA sequences and implications for ant systematics (Hymenoptera: Formicidae: Amblyoponinae). , 2004, Molecular phylogenetics and evolution.

[13]  Trevor Paterson,et al.  Scientific Names Are Ambiguous as Identifiers for Biological Taxa: Their Context and Definition Are Required for Accurate Data Integration , 2005, DILS.

[14]  Sean Martin,et al.  Globally distributed object identification for biological knowledgebases , 2004, Briefings Bioinform..

[15]  B. Fisher,et al.  Molecular systematics of basal subfamilies of ants using 28S rRNA (Hymenoptera: Formicidae). , 2006, Molecular phylogenetics and evolution.

[16]  A. Leviton,et al.  Standards in herpetology and ichthyology : Part I. Standard symbolic codes for institutional resource collections in herpetology and ichthyology , 1985 .

[17]  Yehudah L. Werner,et al.  The case of impact factor versus taxonomy: a proposal , 2006 .

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  Gregory T. Chandler,et al.  Monograph of Gastrolobium (Fabaceae: Mirbelieae) , 2002 .

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[21]  P. Hebert,et al.  DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  S. Steppan,et al.  Molecular phylogeny of the endemic Philippine rodent Apomys (Muridae) and the dynamics of diversification in an oceanic archipelago , 2003 .

[23]  R. Dellavalle,et al.  Going, Going, Gone: Lost Internet References , 2003, Science.

[24]  Roderic D. M. Page,et al.  TBMap: a taxonomic perspective on the phylogenetic database TreeBASE , 2007, BMC Bioinformatics.

[25]  Sean Martin,et al.  The impact of Life Science Identifier on informatics data. , 2005, Drug discovery today.