Developing Standards for Improved Data Quality and for Selecting Fit for Use Biodiversity Data

The quality of biodiversity data publicly accessible via aggregators such as GBIF (Global Biodiversity Information Facility), the ALA (Atlas of Living Australia), iDigBio (Integrated Digitized Biocollections), and OBIS (Ocean Biogeographic Information System) is often questioned, especially by the research community. The Data Quality Interest Group, established by Biodiversity Information Standards (TDWG) and GBIF, has been engaged in four main activities: developing a framework for the assessment and management of data quality using a fitness for use approach; defining a core set of standardised tests and associated assertions based on Darwin Core terms; gathering and classifying user stories to form contextual-themed use cases, such as ‡ § | ¶ # ¤

[1]  Barbara R Stein,et al.  Mammals of the World: MaNIS as an example of data integration in a distributed network environment , 2004 .

[2]  Paul J. Morris,et al.  Kurator-Org/Kurator-Ffdq: Initial Release Of Kurator-Ffdq Library Version 1.0.0. , 2016 .

[3]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[4]  A. Peterson,et al.  The need for continued scientific collecting; a geographic analysis of Mexican bird specimens , 2008 .

[5]  Jorge Soberón,et al.  An International View of National Biological Surveys , 1996 .

[6]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[7]  M. Lane The Global Biodiversity Information Facility , 2005 .

[8]  John La Salle,et al.  A specialist’s audit of aggregated occurrence records: An ‘aggregator’s’ perspective , 2013, ZooKeys.

[9]  Barry Smith,et al.  The environment ontology: contextualising biological and biomedical entities , 2013, Journal of Biomedical Semantics.

[10]  Rebecca J. Rowe,et al.  Elevational gradient analyses and the use of historical museum specimens: a cautionary tale , 2005 .

[11]  J. Edwards Research and Societal Benefits of the Global Biodiversity Information Facility , 2004 .

[12]  W. Landuyt,et al.  Florabank1: a grid-based database on vascular plant distribution in the northern part of Belgium (Flanders and the Brussels Capital region) , 2012, PhytoKeys.

[13]  Quentin Groom,et al.  Data fitness for use in research on alien and invasive species , 2016 .

[14]  Robert Mesibov,et al.  A specialist’s audit of aggregated occurrence records , 2013, ZooKeys.

[15]  S. B. McDowell,et al.  Atlas of elapid snakes of Australia , 1987 .

[16]  Tim Sutton,et al.  How Global Is the Global Biodiversity Information Facility? , 2007, PloS one.

[17]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[18]  John Wieczorek,et al.  Connecting data and expertise: a new alliance for biodiversity knowledge , 2019, Biodiversity data journal.

[19]  Arturo H. Ariño,et al.  CONTENT ASSESSMENT OF THE PRIMARY BIODIVERSITY DATA PUBLISHED THROUGH GBIF NETWORK: STATUS, CHALLENGES AND POTENTIALS , 2013 .

[20]  Wolfgang Schwanghart,et al.  Spatial bias in the GBIF database and its effect on modeling species' geographic distributions , 2014, Ecol. Informatics.

[21]  W. Ponder,et al.  Evaluation of Museum Collection Data for Use in Biodiversity Assessment , 2001 .

[22]  Robert Mesibov,et al.  An audit of some processing effects in aggregated occurrence records , 2018, ZooKeys.

[23]  Amy,et al.  CONTENT ASSESSMENT OF THE PRIMARY BIODIVERSITY DATA PUBLISHED THROUGH GBIF NETWORK : STATUS , CHALLENGES AND POTENTIALS , 2013 .

[24]  Bertram Ludäscher,et al.  Kurator: Tools for Improving Fitness for Use of Biodiversity Data. , 2018 .

[25]  J. Edwards,et al.  The Global Biodiversity Information Facility (GBIF) , 2007 .

[26]  Antonio Mauro Saraiva,et al.  A conceptual framework for quality assessment and management of biodiversity data , 2017, PloS one.

[27]  David R. B. Stockwell,et al.  The use of the GARP genetic algorithm and internet grid computing in the Lifemapper world atlas of species biodiversity , 2005, ArXiv.

[28]  Alexandre Antonelli,et al.  Estimating species diversity and distribution in the era of Big Data: to what extent can we trust public databases? , 2015, Global ecology and biogeography : a journal of macroecology.

[29]  Anton Güntsch,et al.  Biodiversity information standards (TDWG) , 2016 .

[30]  A. Peterson,et al.  New developments in museum-based informatics and applications in biodiversity analysis. , 2004, Trends in ecology & evolution.

[31]  Robert A. Morris,et al.  Kurator: A Kepler Package for Data Curation Workflows , 2012, ICCS.

[32]  Nicholas Chrisman,et al.  THE ERROR COMPONENT IN SPATIAL DATA , 2005 .

[33]  A. Peterson,et al.  Biodiversity and the Internet: Building and Using the Virtual World Museum , 2004 .

[34]  J. Busby,et al.  Linking plant species information to continental biodiversity inventory, climate modeling and environmental monitoring , 1994 .

[35]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.