The taxonomic name resolution service: an online tool for automated standardization of plant names

BackgroundThe digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this ‘names problem’ has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science.ResultsThe TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets.ConclusionsWe show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.

[1]  Alan Paton,et al.  Biodiversity informatics and the plant conservation baseline. , 2009, Trends in plant science.

[2]  A. Bortolus,et al.  Error Cascades in the Biological Sciences: The Unwanted Consequences of Using Bad Taxonomy in Ecology , 2008, Ambio.

[3]  B. Dayrat,et al.  Towards integrative taxonomy , 2005 .

[4]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[5]  FordBryan Parsing expression grammars , 2004 .

[6]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[7]  R. Peet,et al.  Perspectives: Towards a language for mapping relationships among taxonomic concepts , 2009 .

[8]  Constance A. Rinaldo,et al.  The Biodiversity Heritage Library: sharing biodiversity literature with the world , 2009 .

[9]  Q. Wheeler The New Taxonomy , 2008 .

[10]  O. Phillips,et al.  ForestPlots.net: a web application and research tool to manage and analyse tropical forest plot data , 2011 .

[11]  Joe Celko,et al.  Joe Celko's SQL for Smarties: Trees and Hierarchies , 2004 .

[12]  Itis Integrated Taxonomic Information System (ITIS) , 2008 .

[13]  T. N. Gadd,et al.  PHOENIX: the algorithm , 1990 .

[14]  David M. Williams,et al.  The International Code for Nomenclature for algae, fungi and plants - a significant rewrite of the International Code of Botanical Nomenclature , 2012 .

[15]  S. Higgins,et al.  TRY – a global database of plant traits , 2011, Global Change Biology.

[16]  R. Gilmour The International Plant Names Index , 2013 .

[17]  B. S. Manjunath,et al.  The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..

[18]  Thomas E. Lacher,et al.  Latitudinal patterns of range size and species richness of New World woody plants , 2007 .

[19]  C. E. Powell,et al.  Authors of plant names , 1992 .

[20]  V. Funk,et al.  Testing the use of specimen collection data and GIS in biodiversity exploration and conservation decision making in Guyana , 1999, Biodiversity & Conservation.

[21]  Brian J. Enquist,et al.  SALVIAS – the SALVIAS vegetation inventory database , 2012 .

[22]  D. R. McGregor,et al.  Fast approximate string matching , 1988, Softw. Pract. Exp..

[23]  J. Oldeland,et al.  The Global Index of Vegetation-Plot Databases (GIVD): a new resource for vegetation science , 2011 .

[24]  Claire Thomas,et al.  Biodiversity. Biodiversity databases spread, prompting unification call. , 2009, Science.

[25]  William H. McWilliams,et al.  Forest Inventory and Analysis Database of the United States of America (FIA) , 2012 .

[26]  Zhongzhen Zhao,et al.  Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs. , 2011, Journal of ethnopharmacology.

[27]  K. Gardens The Plant List , 2013 .

[28]  Peter F. Stevens,et al.  The Linear Angiosperm Phylogeny Group (LAPG) III: A linear sequence of the families in APG III , 2009 .

[29]  Michelle Rucker,et al.  Encyclopedia of Life , 2014 .

[30]  E. Dooley,et al.  Global Biodiversity Information Facility , 2002, Environmental Health Perspectives.

[31]  Marcel Dicke,et al.  Rewiring of the Jasmonate Signaling Pathway in Arabidopsis during Insect Herbivory , 2011, Front. Plant Sci..

[32]  Robert P Guralnick,et al.  Towards a collaborative, global infrastructure for biodiversity assessment , 2007, Ecology letters.

[33]  Michael Fuller Justin Zobel,et al.  Conflation-based Comparison of Stemming Algorithms , 1998 .

[34]  Campbell O. Webb,et al.  Regional and phylogenetic variation of wood density across 2456 Neotropical tree species. , 2006, Ecological applications : a publication of the Ecological Society of America.

[35]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[36]  Don Faber-Langendoen,et al.  VegBank – a permanent, open-access archive for vegetation-plot data , 2012 .

[37]  G.S. Nezlek,et al.  Rich Internet Applications The Next Stage of Application Development , 2007, 2007 29th International Conference on Information Technology Interfaces.

[38]  Falko Glöckler,et al.  Vegetation databases for the 21st century , 2012 .

[39]  Gustavo Henrique Carvalho,et al.  Plantminer: A web tool for checking and gathering plant species taxonomic information , 2010, Environ. Model. Softw..

[40]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[41]  J L Edwards,et al.  Interoperability of biodiversity databases: biodiversity information on every desktop. , 2000, Science.

[42]  BMC Bioinformatics , 2005 .

[43]  Nico M. Franz,et al.  5 On the Use of Taxonomic Concepts in Support of Biodiversity Research and Taxonomy , 2006 .