Analytical guidelines to increase the value of citizen science data: using eBird data to estimate species occurrence

Citizen science data are valuable for addressing a wide range of ecological research questions, and there has been a rapid increase in the scope and volume of data available. However, data from large-scale citizen science projects typically present a number of challenges that can inhibit robust ecological inferences. These challenges include: species bias, spatial bias, and variation in effort. To demonstrate addressing key challenges in analysing citizen science data, we use the example of estimating species distributions with data from eBird, a large semi-structured citizen science project. We estimate two widely applied metrics of species distributions: encounter rate and occupancy probability. For each metric, we assess the impact of data processing steps that either degrade or refine the data used in the analyses. We also test whether differences in model performance are maintained at different sample sizes. Model performance improved when data processing and analytical methods addressed the challenges arising from citizen science data. The largest gains in model performance were achieved with: 1) the use of complete checklists (where observers report all the species they detect and identify); and 2) the use of covariates describing variation in effort and detectability for each checklist. Occupancy models were more robust to a lack of complete checklists and effort variables. Improvements in model performance with data refinement were more evident with larger sample sizes. Here, we describe processes to refine semi-structured citizen science data to estimate species distributions. We demonstrate the value of complete checklists, which can inform the design and adaptation of citizen science projects. We also demonstrate the value of information on effort. The methods we have outlined are also likely to improve other forms of inference, and will enable researchers to conduct robust analyses and harness the vast ecological knowledge that exists within citizen science data.

[1]  Steve Kelling,et al.  Estimates of observer expertise improve species distributions from citizen science data , 2018 .

[2]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[3]  R. Meentemeyer,et al.  Invasive species distribution modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions? , 2009 .

[4]  Mark G. Meekan,et al.  Acoustic Telemetry Validates a Citizen Science Approach for Monitoring Sharks on Coral Reefs , 2014, PloS one.

[5]  Richard B. Chandler,et al.  unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance , 2011 .

[6]  Krishna Pacifici,et al.  Integrating multiple data sources in species distribution modeling: a framework for data fusion. , 2017, Ecology.

[7]  Kevin Crowston,et al.  From Conservation to Crowdsourcing: A Typology of Citizen Science , 2011, 2011 44th Hawaii International Conference on System Sciences.

[8]  J. K. Legind,et al.  Contribution of citizen science towards international biodiversity monitoring , 2017 .

[9]  B. Schutt Recent past. , 1971, The American journal of nursing.

[10]  Damien R Farine,et al.  Temporal activity patterns of predators and prey across broad geographic scales , 2018, Behavioral Ecology.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  James D. Nichols,et al.  Occupancy models for citizen‐science data , 2019, Methods in Ecology and Evolution.

[13]  Heather J. Lynch,et al.  Using citizen science to estimate lichen diversity , 2014 .

[14]  Johannes Kamp,et al.  Unstructured citizen science data fail to detect long‐term population declines of common birds in Denmark , 2016 .

[15]  Jane Elith,et al.  blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models , 2018, bioRxiv.

[16]  Antonio Gasparrini,et al.  Distributed Lag Linear and Non-Linear Models in R: The Package dlnm. , 2011, Journal of statistical software.

[17]  E. Howard,et al.  Citizen Science Observations of Monarch Butterfly Overwintering in the Southern United States , 2010 .

[18]  S. Newson,et al.  A novel citizen science approach for large-scale standardised monitoring of bat activity and distribution, evaluated in eastern England , 2015 .

[19]  F. Olmos,et al.  Observation of Diurnal Soaring Raptors In Northeastern Brazil Depends On Weather Conditions and Time of Day , 2018, Journal of Raptor Research.

[20]  W. Link,et al.  The first 50 years of the North American Breeding Bird Survey , 2017, The Condor.

[21]  Thomas G. Dietterich,et al.  The eBird enterprise: An integrated approach to development and application of citizen science , 2014 .

[22]  Arco J. van Strien,et al.  Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models , 2013 .

[23]  R. Kadmon,et al.  EFFECT OF ROADSIDE BIAS ON THE ACCURACY OF PREDICTIVE MAPS PRODUCED BY BIOCLIMATIC MODELS , 2004 .

[24]  Hugh P. Possingham,et al.  Realising the full potential of citizen science monitoring programs , 2013 .

[25]  Ayesha I. T. Tulloch,et al.  A behavioural ecology approach to understand volunteer surveying for citizen science datasets , 2012 .

[26]  R. T. Brumfield,et al.  Niche evolution and diversification in a Neotropical radiation of birds (Aves: Furnariidae) , 2017, Evolution; international journal of organic evolution.

[27]  Laura López-Hoffman,et al.  Recreation economics to inform migratory species conservation: Case study of the northern pintail. , 2018, Journal of environmental management.

[28]  Javier Otegui,et al.  Increasing phenological asynchrony between spring green-up and arrival of migratory birds , 2017, Scientific Reports.

[29]  Krishna Pacifici,et al.  The recent past and promising future for data integration methods to estimate species’ distributions , 2019, Methods in Ecology and Evolution.

[30]  Tomas J. Bird,et al.  Statistical solutions for error and bias in global citizen science datasets , 2014 .

[31]  Steve Kelling,et al.  Finding the signal in the Noise of Citizen Science Observations , 2018, bioRxiv.

[32]  Murray Ellis,et al.  Effects of weather, time of day, and survey effort on estimates of species richness in temperate woodlands , 2018 .

[33]  B. Erasmus,et al.  Geographic sampling bias in the South African Frog Atlas Project: implications for conservation planning , 2010, Biodiversity and Conservation.

[34]  P. Grandcolas,et al.  Taxonomic bias in biodiversity data and societal preferences , 2017, Scientific Reports.

[35]  Alejandro Ruete,et al.  Explaining Spatial Variation in the Recording Effort of Citizen Science Data across Multiple Taxa , 2016, PloS one.

[36]  M. T. Murphy,et al.  Follow the rain? Environmental drivers of Tyrannus migration across the New World , 2018, The Auk.

[37]  David B. Roy,et al.  Statistics for citizen science: extracting signals of change from noisy ecological data , 2014 .

[38]  Helen E Roy,et al.  The diversity and evolution of ecological and environmental citizen science , 2017, PloS one.

[39]  Z. Huaman,et al.  Assessing the Geographic Representativeness of Genebank Collections: the Case of Bolivian Wild Potatoes , 2000, Conservation biology : the journal of the Society for Conservation Biology.

[40]  R. Dennis,et al.  Bias in Butterfly Distribution Maps: The Influence of Hot Spots and Recorder's Home Range , 2000, Journal of Insect Conservation.

[41]  Mark Hill,et al.  Local frequency as a key to interpreting species occurrence data when recording effort is not known , 2012 .

[42]  J. Lobo,et al.  How well does presence‐only‐based species distribution modelling predict assemblage diversity? A case study of the Tenerife flora , 2011 .