Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps

Abstract Background Open-access biodiversity databases including mainly citizen science data make temporally and spatially extensive species’ observation data available to a wide range of users. Such data have limitations however, which include: sampling bias in favour of recorder distribution, lack of survey effort assessment, and lack of coverage of the distribution of all organisms. These limitations are not always recorded, while any technical assessment or scientific research based on such data should include an evaluation of the uncertainty of its source data and researchers should acknowledge this information in their analysis. The here proposed maps of ignorance are a critical and easy way to implement a tool to not only visually explore the quality of the data, but also to filter out unreliable results. New information I present simple algorithms to display ignorance maps as a tool to report the spatial distribution of the bias and lack of sampling effort across a study region. Ignorance scores are expressed solely based on raw data in order to rely on the fewest assumptions possible. Therefore there is no prediction or estimation involved. The rationale is based on the assumption that it is appropriate to use species groups as a surrogate for sampling effort because it is likely that an entire group of species observed by similar methods will share similar bias. Simple algorithms are then used to transform raw data into ignorance scores scaled 0-1 that are easily comparable and scalable. Because of the need to perform calculations over big datasets, simplicity is crucial for web-based implementations on infrastructures for biodiversity information. With these algorithms, any infrastructure for biodiversity information can offer a quality report of the observations accessed through them. Users can specify a reference taxonomic group and a time frame according to the research question. The potential of this tool lies in the simplicity of its algorithms and in the lack of assumptions made about the bias distribution, giving the user the freedom to tailor analyses to their specific needs.

[1]  A. Suarez,et al.  The Value of Museum Collections for Research and Society , 2004 .

[2]  Kalle Ruokolainen,et al.  Analysing botanical collecting effort in Amazonia and correcting for it in species range estimation , 2007 .

[3]  J. Franklin Moving beyond static species distribution models in support of conservation biogeography , 2010 .

[4]  Aurélien Besnard,et al.  Field validation shows bias‐corrected pseudo‐absence selection is the best method for predictive species‐distribution modelling , 2014 .

[5]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[6]  M. McCarthy,et al.  Profiting from prior information in Bayesian analyses of ecological data , 2005 .

[7]  Jorge Soberón,et al.  Prediction of potential areas of species distributions based on presence-only data , 2005, Environmental and Ecological Statistics.

[8]  Tobias Jeppsson,et al.  The use of historical collections to estimate population trends: a case study using Swedish longhorn beetles (Coleoptera: Cerambycidae). , 2010 .

[9]  Johan Nilsson,et al.  Swedish LifeWatch - a biodiversity infrastructure integrating and reusing data from citizen science, monitoring and research , 2014, Hum. Comput..

[10]  W. Ponder,et al.  Evaluation of Museum Collection Data for Use in Biodiversity Assessment , 2001 .

[11]  Mark Hill,et al.  Local frequency as a key to interpreting species occurrence data when recording effort is not known , 2012 .

[12]  John H. Lawton,et al.  Correcting for variation in recording effort in analyses of diversity hotspots , 1993 .

[13]  Hugh P. Possingham,et al.  Evaluating protected area effectiveness using bird lists in the Australian Wet Tropics , 2015 .

[14]  C. Ricotta,et al.  Accounting for uncertainty when mapping species distributions: The need for maps of ignorance , 2011 .

[15]  Alberto Jiménez-Valverde,et al.  Limitations of Biodiversity Databases: Case Study on Seed‐Plant Diversity in Tenerife, Canary Islands , 2007, Conservation biology : the journal of the Society for Conservation Biology.

[16]  T. Snäll,et al.  Evaluating citizen-based presence data for bird monitoring , 2011 .

[17]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[18]  Matthew J. Smith,et al.  The Effects of Sampling Bias and Model Complexity on the Predictive Performance of MaxEnt Species Distribution Models , 2013, PloS one.

[19]  Walter Jetz,et al.  Species richness, hotspots, and the scale dependence of range maps in ecology and conservation , 2007, Proceedings of the National Academy of Sciences.

[20]  J. Franklin Species distribution models in conservation biogeography: developments and challenges , 2013 .

[21]  J. Hortal Uncertainty and the measurement of terrestrial biodiversity gradients , 2008 .

[22]  S. Nielsen,et al.  Accounting for spatially biased sampling effort in presence‐only species distribution modelling , 2015 .

[23]  Jorge Soberón,et al.  Integrating fundamental concepts of ecology, biogeography, and sampling into effective ecological niche modeling and species distribution modeling , 2012 .