Fishing for data and sorting the catch: assessing the data quality, completeness and fitness for use of data in marine biogeographic databases

Being able to assess the quality and level of completeness of data has become indispensable in marine biodiversity research, especially when dealing with large databases that typically compile data from a variety of sources. Very few integrated databases offer quality flags on the level of the individual record, making it hard for users to easily extract the data that are fit for their specific purposes. This article describes the different steps that were developed to analyse the quality and completeness of the distribution records within the European and international Ocean Biogeographic Information Systems (EurOBIS and OBIS). Records are checked on data format, completeness and validity of information, quality and detail of the used taxonomy and geographic indications and whether or not the record is a putative outlier. The corresponding quality control (QC) flags will not only help users with their data selection, they will also help the data management team and the data custodians to identify possible gaps and errors in the submitted data, providing scope to improve data quality. The results of these quality control procedures are as of now available on both the EurOBIS and OBIS databases. Through the Biology portal of the European Marine Observation and Data Network (EMODnet Biology), a subset of EurOBIS records—passing a specific combination of these QC steps—is offered to the users. In the future, EMODnet Biology will offer a wide range of filter options through its portal, allowing users to make specific selections themselves. Through LifeWatch, users can already upload their own data and check them against a selection of the here described quality control procedures. Database URL: www.eurobis.org (www.iobis.org; www.emodnet-biology.eu/)

[1]  Jeff Miller,et al.  Short Report: Reaction Time Analysis with Outlier Exclusion: Bias Varies with Sample Size , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[2]  A. Wulff,et al.  Biodiversity, biogeography and zonation of marine benthic micro- and macroalgae in the Arctic and Antarctic , 2009 .

[3]  Simon Claus,et al.  Marine Regions: Towards a Global Standard for Georeferenced Marine Names and Boundaries , 2014 .

[4]  H. Chandler Database , 1985 .

[5]  R. O'dor,et al.  Marine Biodiversity and Biogeography – Regional Comparisons of Global Issues, an Introduction , 2010, PloS one.

[6]  P. Barber,et al.  MARSPEC: ocean climate layers for marine spatial ecology , 2013 .

[7]  Elizabeth M. H. Wellington,et al.  The World Bacterial Biogeography and Biodiversity through Databases: A Case Study of NCBI Nucleotide Database and GBIF Database , 2013, BioMed research international.

[8]  J. P. Park The Identification Of Multiple Outliers , 2000 .

[9]  Skipton N. C. Woolley,et al.  Fathom out: biogeographical subdivision across the Western Australian continental margin – a multispecies modelling approach , 2013 .

[10]  Martin Vingron,et al.  Processing and quality control of DNA array hybridization data , 2000, Bioinform..

[11]  Robert J. Whittaker,et al.  Conservation biogeography - foundations, concepts and challenges: Conservation biogeography: foundations, concepts and challenges , 2010 .

[12]  T D Otto,et al.  ChromaPipe: a pipeline for analysis, quality control and management for a DNA sequencing facility. , 2008, Genetics and molecular research : GMR.

[13]  Sharon A. Robinson,et al.  The spatial structure of Antarctic biodiversity , 2014 .

[14]  Pasquale Pagano,et al.  An infrastructure-oriented approach for supporting biodiversity research , 2015, Ecol. Informatics.

[15]  H. Zibrowius,et al.  Verruca stroemia and Verruca spengleri (Crustacea: Cirripedia): distribution in the north-eastern Atlantic and the Mediterranean Sea , 2003, Journal of the Marine Biological Association of the United Kingdom.

[16]  F. Grassle The Ocean Biogeographic Information System (OBIS): An On-line, Worldwide Atlas for Accessing, Modeling and Mapping Marine Biological Data in a Multidimensional Geographic Context , 2000 .

[17]  Yunqing Zhang,et al.  A portal for the Ocean Biogeographic Information System Un portail pour le Système d'information biogéographique sur l'océan , 2002 .

[18]  Simon P. Wilson,et al.  The Magnitude of Global Marine Species Diversity , 2012, Current Biology.

[19]  Ross S. Lunetta,et al.  Remote Sensing and GIS Accuracy Assessment , 2007 .

[20]  R. S. Wimpenny,et al.  International Council for the Exploration of the Sea , 2008 .

[21]  David Obura,et al.  The Diversity and Biogeography of Western Indian Ocean Reef-Building Corals , 2012, PloS one.

[22]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[23]  Gregor Hagedorn,et al.  Fauna Europaea – all European animal species on the web , 2014, Biodiversity data journal.

[24]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[25]  Marcel R. Wernand,et al.  Quality control of automated hyperspectral remote sensing measurements from a seaborne platform , 2011 .

[26]  Greta Bocedi,et al.  RangeShifter: a platform for modelling spatial eco‐evolutionary dynamics and species' responses to environmental changes , 2014 .

[27]  Tim Sutton,et al.  How Global Is the Global Biodiversity Information Facility? , 2007, PloS one.

[28]  Javier Otegui,et al.  The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet , 2014, PloS one.

[29]  S. Claus,et al.  Analysing the content of the European Ocean Biogeographic Information System (EurOBIS): available data, limitations, prospects and a look at the future , 2011, Hydrobiologia.

[30]  G. Jenkins,et al.  The outer Bristol Channel marine habitat study , 2006 .

[31]  Alessandro Chiarucci,et al.  Old and new challenges in using species diversity for assessing biodiversity , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[32]  D. Ross Robertson,et al.  Global biogeographical data bases on marine fishes: caveat emptor , 2008 .

[33]  B. Beker,et al.  Data integration for European marine biodiversity research: creating a database on benthos and plankton to study large-scale patterns and long-term changes , 2010, Hydrobiologia.

[34]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[35]  B. Vanhoorne,et al.  World Register of Marine Species , 2013 .

[36]  D. Kapoor General bathymetric chart of the oceans (GEBCO) , 1981 .

[37]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[38]  E. Acuña,et al.  A Meta analysis study of outlier detection methods in classification , 2004 .

[39]  C. Amante,et al.  ETOPO1 arc-minute global relief model : procedures, data sources and analysis , 2009 .

[40]  K.,et al.  NOT TO BE CITED WITHOUT PRIOR REFERENCE TO THE AUTHORS International Council for the Exploration of the Sea , 2003 .

[41]  M K Sherwood Quality assurance in biomedical or clinical engineering. , 1991, Journal of clinical engineering.