Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale

Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a ‘Big Data’ approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence‐only or presence–absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi‐source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter‐ or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi‐source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA‐based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals.

[1]  Jan Pergl,et al.  Global exchange and accumulation of non-native plants , 2015, Nature.

[2]  A. F. O'connell,et al.  Multi-scale occupancy estimation and modelling using multiple detection methods , 2008 .

[3]  Boris Schröder,et al.  How to understand species’ niches and range dynamics: a demographic research agenda for biogeography , 2012 .

[4]  J. Lennon,et al.  Incorporating uncertainty in predictive species distribution modelling , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[5]  Naiara Rodríguez-Ezpeleta,et al.  Genomics in marine monitoring: new opportunities for assessing marine health status. , 2013, Marine pollution bulletin.

[6]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[7]  J. Andrew Royle,et al.  Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations and Communities , 2008 .

[8]  Craig Moritz,et al.  Biodiversity analysis in the digital era , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  S. Butchart,et al.  Global indicators of biological invasion: species numbers, biodiversity impact and policy responses , 2010 .

[10]  L. Tedersoo,et al.  Digital identifiers for fungal species , 2016, Science.

[11]  A. Budden,et al.  Big data and the future of ecology , 2013 .

[12]  Louise McRae,et al.  Global biodiversity monitoring: From data sources to Essential Biodiversity Variables , 2017 .

[13]  Peter Haase,et al.  Bridging the gap between biodiversity data and policy reporting needs: An Essential Biodiversity Variables perspective , 2016 .

[14]  Alberto Apostolico,et al.  Global Biodiversity Informatics Outlook: Delivering biodiversity knowledge in the information age , 2013 .

[15]  Naiara Rodríguez-Ezpeleta,et al.  Metabarcoding of marine zooplankton: prospects, progress and pitfalls , 2016 .

[16]  P. Ehrlich,et al.  Accelerated modern human–induced species losses: Entering the sixth mass extinction , 2015, Science Advances.

[17]  David N. Bonter,et al.  Citizen Science as an Ecological Research Tool: Challenges and Benefits , 2010 .

[18]  Walter Jetz,et al.  Integrating biodiversity distribution knowledge: toward a global map of life. , 2012, Trends in ecology & evolution.

[19]  B T Grenfell,et al.  Age, sex, density, winter weather, and population crashes in Soay sheep. , 2001, Science.

[20]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[21]  R. Guralnick,et al.  BioGeomancer: Automated Georeferencing to Map the World's Biodiversity Data , 2006, PLoS biology.

[22]  Marta Mattoso,et al.  A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.

[23]  N. Knowlton,et al.  Censusing marine eukaryotic diversity in the twenty-first century , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[24]  Erin M. Bayne,et al.  REVIEW: Wildlife camera trapping: a review and recommendations for linking surveys to ecological processes , 2015 .

[25]  Anne E. Thessen,et al.  Data issues in the life sciences , 2011, ZooKeys.

[26]  J. Chave The problem of pattern and scale in ecology: what have we learned in 20 years? , 2013, Ecology letters.

[27]  G. Guillera‐Arroita Modelling of species distributions, range dynamics and communities under imperfect detection: advances, challenges and opportunities , 2017 .

[28]  Brian L. Sullivan,et al.  eBird: Engaging Birders in Science and Conservation , 2011, PLoS biology.

[29]  Alexander Schliep,et al.  The Global Museum: natural history collections and the future of evolutionary science and public education , 2020, PeerJ.

[30]  Steve Kelling,et al.  Taking a ‘Big Data’ approach to data quality in a citizen science project , 2015, Ambio.

[31]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[32]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[33]  Victoria J. Burton,et al.  Has land use pushed terrestrial biodiversity beyond the planetary boundary? A global assessment , 2016, Science.

[34]  James D. Nichols,et al.  Capture-recapture models. , 1992 .

[35]  Carsten Meyer,et al.  Multidimensional biases, gaps and uncertainties in global plant occurrence information. , 2016, Ecology letters.

[36]  Peter Haase,et al.  A suite of essential biodiversity variables for detecting critical biodiversity change , 2018, Biological reviews of the Cambridge Philosophical Society.

[37]  Chris Mungall,et al.  The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation , 2016, Journal of Biomedical Semantics.

[38]  Heather J. Lynch,et al.  An Object-Based Image Analysis Approach for Detecting Penguin Guano in very High Spatial Resolution Satellite Images , 2016, Remote. Sens..

[39]  William A. Link,et al.  The North American Breeding Bird Survey 1966–2011: Summary Analysis and Species Accounts , 2013 .

[40]  Zheng Yang,et al.  Spotting East African Mammals in Open Savannah from Space , 2014, PloS one.

[41]  Anne Bowser,et al.  An operational definition of essential biodiversity variables , 2017, Biodiversity and Conservation.

[42]  Pierre Taberlet,et al.  The ecologist's field guide to sequence‐based identification of biodiversity , 2016 .

[43]  Timothy G. O'Brien,et al.  The Wildlife Picture Index: monitoring top trophic levels , 2010 .

[44]  Jane Elith,et al.  Comparing species abundance models , 2006 .

[45]  C. A. Mücher,et al.  Environmental science: Agree on biodiversity metrics to track from space , 2015, Nature.

[46]  Stephen T. Buckland,et al.  Distance Sampling: Methods and Applications , 2015 .

[47]  Amos Maritan,et al.  Towards a unified descriptive theory for spatial ecology: predicting biodiversity patterns across spatial scales , 2015 .

[48]  Brendan J. Godley,et al.  Camera technology for monitoring marine biodiversity and human impact , 2016 .

[49]  J. K. Legind,et al.  Contribution of citizen science towards international biodiversity monitoring , 2017 .

[50]  Jorge A. Ahumada,et al.  TEAM: a standardised camera trap survey to monitor terrestrial vertebrate communities in tropical forests , 2014 .

[51]  J. Lamarque,et al.  Global Biodiversity: Indicators of Recent Declines , 2010, Science.

[52]  Timothy L. Tickle,et al.  Computational meta'omics for microbial community studies , 2013, Molecular systems biology.

[53]  J. Sauer,et al.  Consistent response of bird populations to climate change on two continents , 2016, Science.

[54]  Brendan A. Wintle,et al.  Is my species distribution model fit for purpose? Matching data and models to applications , 2015 .

[55]  D. Fink,et al.  Spatiotemporal exploratory models for broad-scale survey data. , 2010, Ecological applications : a publication of the Ecological Society of America.

[56]  A. Townsend Peterson,et al.  VertNet: A New Model for Biodiversity Data Sharing , 2010, PLoS biology.

[57]  John La Salle,et al.  A specialist’s audit of aggregated occurrence records: An ‘aggregator’s’ perspective , 2013, ZooKeys.

[58]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[59]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[60]  A. Simmons,et al.  The Concept of Essential Climate Variables in Support of Climate Research, Applications, and Policy , 2014 .

[61]  Matthew B Jones,et al.  Ecoinformatics: supporting ecology as a data-intensive science. , 2012, Trends in ecology & evolution.

[62]  D. Fink,et al.  Modeling avian full annual cycle distribution and population trends with citizen science data , 2019, Ecological applications : a publication of the Ecological Society of America.

[63]  E. Pollard,et al.  Monitoring Butterflies for Ecology and Conservation: The British Butterfly Monitoring Scheme , 1994 .

[64]  Valérie Monfort,et al.  Bridging the Gap between , 2012 .

[65]  Naeem,et al.  Ecosystems and Human Well-Being: Biodiversity Synthesis , 2005 .

[66]  Walter Jetz,et al.  Global priorities for an effective information basis of biodiversity distributions , 2015, Nature Communications.

[67]  Gregory P. Asner,et al.  Observing Changing Ecological Diversity in the Anthropocene , 2013 .

[68]  Alex Hardisty,et al.  UvA-DARE ( Digital Academic Repository ) A decadal view of biodiversity informatics : challenges and priorities , 2013 .

[69]  Matthew B. Jones,et al.  Managing Scientific Metadata , 2001, IEEE Internet Comput..

[70]  Erlend B. Nilsen,et al.  Integrating data from different survey types for population monitoring of an endangered species: the case of the Eld’s deer , 2019, Scientific Reports.

[71]  N. Pettorelli,et al.  Framing the concept of satellite remote sensing essential biodiversity variables: challenges and future directions , 2016 .

[72]  Louise McRae,et al.  Priorities for big biodiversity data , 2017 .

[73]  Michael Schaub,et al.  Bayesian Population Analysis using WinBUGS: A Hierarchical Perspective , 2011 .

[74]  Eren Turak,et al.  Essential biodiversity variables for measuring change in global freshwater biodiversity , 2017 .

[75]  Ben Collen,et al.  Monitoring Change in Vertebrate Abundance: the Living Planet Index , 2009, Conservation biology : the journal of the Society for Conservation Biology.

[76]  Alan R. Williams,et al.  ENM Components: a new set of web service-based workflow components for ecological niche modelling , 2016 .

[77]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[78]  Anna F. Cord,et al.  Linking earth observation and taxonomic, structural and functional biodiversity: local to ecosystem perspectives , 2016 .

[79]  W. Jetz,et al.  Uncertainty, priors, autocorrelation and disparate data in downscaling of species distributions , 2014 .

[80]  Michael Drielsma,et al.  Using the essential biodiversity variables framework to measure biodiversity change at national scale , 2017 .

[81]  Johan Nilsson,et al.  Swedish LifeWatch - a biodiversity infrastructure integrating and reusing data from citizen science, monitoring and research , 2014, Hum. Comput..

[82]  Mark Schildhauer,et al.  Cyberinfrastructure for an integrated botanical information network to investigate the ecological impacts of global climate change on plant biodiversity , 2016 .

[83]  Dirk S. Schmeller,et al.  Monitoring Essential Biodiversity Variables at the Species Level , 2017 .

[84]  Caterina Penone,et al.  Large-scale semi-automated acoustic monitoring allows to detect temporal decline of bush-crickets , 2016 .

[85]  H. D. Cooper,et al.  A mid-term analysis of progress toward international biodiversity targets , 2014, Science.

[86]  Bart Kranstauber,et al.  Camera traps as sensor networks for monitoring animal communities , 2009, 2009 IEEE 34th Conference on Local Computer Networks.

[87]  Richard Fox,et al.  Quantifying range‐wide variation in population trends from local abundance surveys and widespread opportunistic occurrence records , 2014 .

[88]  John L. Harper,et al.  Ecology: from individuals to ecosystems. 4th edition , 2006 .

[89]  James Cheney,et al.  The W3C PROV family of specifications for modelling provenance metadata , 2013, EDBT '13.

[90]  Animal telemetry: Follow the insects. , 2015 .

[91]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[92]  J. Randers,et al.  The Living Planet Index: using species population time series to track trends in biodiversity , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[93]  D. Fink,et al.  Broad-scale citizen science data from checklists: prospects and challenges for macroecology , 2012 .

[94]  I. MacKenzie Occupancy estimation and modeling , 2013 .

[95]  N. Pettorelli,et al.  Essential Biodiversity Variables , 2013, Science.

[96]  Henrique M. Pereira,et al.  Global Biodiversity Change: The Bad, the Good, and the Unknown , 2012 .

[97]  Michael W. Carroll Creative Commons and the New Intermediaries , 2005 .

[98]  Javier Otegui,et al.  The geospatial data quality REST API for primary biodiversity data , 2016, Bioinform..

[99]  Zhenyuan Lu,et al.  The taxonomic name resolution service: an online tool for automated standardization of plant names , 2013, BMC Bioinformatics.

[100]  John Sidney,et al.  An ontology for major histocompatibility restriction , 2016, Journal of Biomedical Semantics.

[101]  Chaitanya K. Baru,et al.  Data acquisition and management software for camera trap data: A case study from the TEAM Network , 2011, Ecol. Informatics.

[102]  Javier Otegui,et al.  The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet , 2014, PloS one.

[103]  Marc Picheral,et al.  Digital zooplankton image analysis using the ZooScan integrated system , 2010 .

[104]  Louise McRae,et al.  The Diversity-Weighted Living Planet Index: Controlling for Taxonomic Bias in a Global Biodiversity Indicator , 2017, PloS one.

[105]  Steve Kelling,et al.  Crowdsourcing Meets Ecology: Hemisphere-Wide Spatiotemporal Species Distribution Models , 2014, AI Mag..

[106]  John Kunze,et al.  Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data , 2015, ZooKeys.

[107]  Pelin Yilmaz,et al.  Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013) , 2014, Standards in Genomic Sciences.

[108]  William K. Michener,et al.  NONGEOSPATIAL METADATA FOR THE ECOLOGICAL SCIENCES , 1997 .

[109]  Edmund Hart,et al.  Towards a more reproducible ecology , 2016 .

[110]  M. Willig,et al.  Standardized Assessment of Biodiversity Trends in Tropical Forest Protected Areas: The End Is Not in Sight , 2016, PLoS biology.

[111]  Qichao Zhou,et al.  Trends in Diatom Research Since 1991 Based on Topic Modeling , 2019, Microorganisms.

[112]  Thomas G. Dietterich,et al.  The eBird enterprise: An integrated approach to development and application of citizen science , 2014 .

[113]  Robert Lanfear,et al.  Public Data Archiving in Ecology and Evolution: How Well Are We Doing? , 2015, PLoS biology.

[114]  R. Kays,et al.  Terrestrial animal tracking as an eye on life and planet , 2015, Science.

[115]  P. Hanson,et al.  Wireless Sensor Networks for Ecology , 2005 .

[116]  Eric Armijo,et al.  Challenges and opportunities for the Bolivian Biodiversity Observation Network , 2015 .

[117]  Éamonn Ó Tuama,et al.  Global Infrastructures for Biodiversity Data and Services , 2017 .

[118]  Carole Goble,et al.  A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control , 2014, Biodiversity data journal.

[119]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[120]  Jun Yu,et al.  Emergent Filters: Automated Data Verification in a Large-Scale Citizen Science Project , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.

[121]  X. Yang,et al.  An integrated view of data quality in Earth observation , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[122]  D. Fink,et al.  Novel seasonal land cover associations for eastern North American forest birds identified through dynamic species distribution modelling , 2016 .

[123]  Gaurav Vaidya,et al.  Avibase – a database system for managing and organizing taxonomic concepts , 2014, ZooKeys.

[124]  Graziano Pesole,et al.  Towards global interoperability for supporting biodiversity research on essential biodiversity variables (EBVs) , 2015 .

[125]  Essential biodiversity , 2019, Nature Ecology & Evolution.

[126]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[127]  Jay M. Ver Hoef,et al.  Using spatiotemporal statistical models to estimate animal abundance and infer ecological dynamics from survey counts , 2015 .

[128]  Graziano Pesole,et al.  UvA-DARE ( Digital Academic Repository ) BioVeL : a virtual laboratory for data analysis and modelling in biodiversity science and ecology , 2016 .

[129]  Anne E. Magurran,et al.  The geometric mean of relative abundance indices : a biodiversity measure with a difference , 2011 .

[130]  Bruce E. Borders,et al.  Assessment of regression kriging for spatial interpolation – comparisons of seven GIS interpolation methods , 2013 .

[131]  Tatsuya Amano,et al.  Spatial Gaps in Global Biodiversity Information and the Role of Citizen Science , 2016 .

[132]  Michael J. O. Pocock,et al.  Bias and information in biological records , 2015 .

[133]  Jorge A. Ahumada,et al.  Monitoring the Status and Trends of Tropical Forest Terrestrial Vertebrate Communities from Camera Trap Data: A Tool for Conservation , 2013, PloS one.

[134]  Barry Smith,et al.  Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies , 2014, PloS one.

[135]  Á. Borja,et al.  Environmental Status Assessment Using DNA Metabarcoding: Towards a Genetics Based Marine Biotic Index (gAMBI) , 2014, PloS one.