A survey of biodiversity informatics: Concepts, practices, and challenges

The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision‐makers in ways that they can effectively use them. The development and deployment of tools and techniques to generate these indicators require having access to trustworthy data from biological collections, field surveys and automated sensors, molecular data, and historic academic literature. The transformation of these raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques applied to manage and analyze these data constitute an area usually called biodiversity informatics. Biodiversity data follow a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Craig Moritz,et al.  Biodiversity analysis in the digital era , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Lionel Guy,et al.  Deep mitochondrial origin outside the sampled alphaproteobacteria , 2018, Nature.

[4]  David O. Holmes,et al.  Improving precision and recall for Soundex retrieval , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[5]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[6]  Jesse Cleary,et al.  Data integration for conservation: Leveraging multiple data types to advance ecological assessments and habitat modeling for marine megavertebrates using OBIS-SEAMAP , 2014, Ecol. Informatics.

[7]  Nicholas Chrisman,et al.  Part 2: Issues and Problems Relating to Cartographic Data Use, Exchange and Transfer: The Role Of Quality Information In The Long-Term Functioning Of A Geographic Information System , 1984 .

[8]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[9]  F. Frentiu Ecological Niches: Linking Classical and Contemporary Approaches , 2004 .

[10]  R. Guralnick,et al.  Biodiversity informatics: automated approaches for documenting global biodiversity patterns and processes , 2009, Bioinform..

[11]  Javier Otegui,et al.  The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet , 2014, PloS one.

[12]  A. Peterson,et al.  Ecologic niche modeling and differentiation of populations of Triatoma brasiliensis neiva, 1911, the most important Chagas' disease vector in northeastern Brazil (hemiptera, reduviidae, triatominae). , 2002, The American journal of tropical medicine and hygiene.

[13]  Fred W. Glover,et al.  Future paths for integer programming and links to artificial intelligence , 1986, Comput. Oper. Res..

[14]  A. Townsend Peterson,et al.  Rethinking receiver operating characteristic analysis applications in ecological niche modeling , 2008 .

[15]  Ulrik Brandes,et al.  What is network science? , 2013, Network Science.

[16]  A. Peterson,et al.  Biodiversity informatics: managing and applying primary biodiversity data. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17]  Bertram Ludäscher,et al.  Kurator: Tools for Improving Fitness for Use of Biodiversity Data. , 2018 .

[18]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[19]  Albert Y. Zomaya,et al.  A Survey of Mobile Device Virtualization , 2016, ACM Comput. Surv..

[20]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[21]  R. Lourenço-de-Oliveira,et al.  Potential risk of re-emergence of urban transmission of Yellow Fever virus in Brazil facilitated by competent Aedes populations , 2017, Scientific Reports.

[22]  F. Bisby The quiet revolution: biodiversity informatics and the internet. , 2000, Science.

[23]  J L Edwards,et al.  Interoperability of biodiversity databases: biodiversity information on every desktop. , 2000, Science.

[24]  Néstor Fernández,et al.  Essential Biodiversity Variables: Integrating In-Situ Observations and Remote Sensing Through Modeling , 2020, Remote Sensing of Plant Biodiversity.

[25]  Roderic D. M. Page,et al.  Biodiversity informatics: the challenge of linking data and the role of shared identifiers , 2008, Briefings Bioinform..

[26]  F. Chapin,et al.  Consequences of changing biodiversity , 2000, Nature.

[27]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[28]  M. Cadotte Ecological Niches: Linking Classical and Contemporary Approaches , 2004, Biodiversity & Conservation.

[29]  David B. Lindenmayer,et al.  DYNAMIC SPECIES CO–OCCURRENCE NETWORKS REQUIRE DYNAMIC BIODIVERSITY SURROGATES , 2016 .

[30]  Carlos Peña,et al.  VoSeq: A Voucher and DNA Sequence Web Application , 2012, PloS one.

[31]  Edward C Holmes,et al.  Evolutionary history and phylogeography of human viruses. , 2008, Annual review of microbiology.

[32]  Elisa Thébault,et al.  Identifying compartments in presence–absence matrices and bipartite networks: insights into modularity measures , 2013 .

[33]  F. Grassle The Ocean Biogeographic Information System (OBIS): An On-line, Worldwide Atlas for Accessing, Modeling and Mapping Marine Biological Data in a Multidimensional Geographic Context , 2000 .

[34]  Lindsay P. Campbell,et al.  NicheA: creating virtual species and ecological niches in multivariate environmental scenarios , 2016 .

[35]  Seth Kaufman,et al.  MorphoBank: phylophenomics in the “cloud” , 2011, Cladistics : the international journal of the Willi Hennig Society.

[36]  A. Townsend Peterson,et al.  Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas , 2012 .

[37]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[38]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[39]  Walter G. Berendsohn,et al.  The concept of "potential taxa" in databases , 1995 .

[40]  Arthur Korte,et al.  Arabidopsis thaliana AUCSIA-1 Regulates Auxin Biology and Physically Interacts with a Kinesin-Related Protein , 2012, PloS one.

[41]  A. Peterson Uses and requirements of ecological niche models and related distributional models , 2006 .

[42]  Steven J. Phillips,et al.  WHAT MATTERS FOR PREDICTING THE OCCURRENCES OF TREES: TECHNIQUES, DATA, OR SPECIES' CHARACTERISTICS? , 2007 .

[43]  Renzo Kottmann,et al.  Meeting Report: Hackathon-Workshop on Darwin Core and MIxS Standards Alignment (February 2012) , 2012, Standards in genomic sciences.

[44]  M. Rounsevell,et al.  Exposure of European biodiversity to changes in human-induced pressures , 2008 .

[45]  Taylor H. Ricketts,et al.  The Convention on Biological Diversity's 2010 Target , 2005, Science.

[46]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[47]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[48]  Michael Hofreiter,et al.  New life for ancient DNA. , 2012, Scientific American.

[49]  R. Pearson,et al.  Predicting species distributions from small numbers of occurrence records: A test case using cryptic geckos in Madagascar , 2006 .

[50]  Stuart R. Borrett,et al.  The rise of Network Ecology: Maps of the topic diversity and scientific collaboration , 2013, 1311.1785.

[51]  Heather A. Piwowar,et al.  Data reuse and the open data citation advantage , 2013, PeerJ.

[52]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[53]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[54]  T. Dawson,et al.  Model‐based uncertainty in species range prediction , 2006 .

[55]  Birgitta König-Ries,et al.  Towards an Ecological Trait-data Standard , 2018, bioRxiv.

[56]  Thomas G. Dietterich,et al.  The eBird enterprise: An integrated approach to development and application of citizen science , 2014 .

[57]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[58]  G. Rambold,et al.  FAIR digital objects in environmental and life sciences should comprise workflow operation design data and method information for repeatability of study setups and reproducibility of results , 2020, Database J. Biol. Databases Curation.

[59]  Matthew E. Aiello-Lammens,et al.  spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models , 2015 .

[60]  R. Guralnick,et al.  BioGeomancer: Automated Georeferencing to Map the World's Biodiversity Data , 2006, PLoS biology.

[61]  Brendan A. Wintle,et al.  Imperfect detection impacts the performance of species distribution models , 2014 .

[62]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[63]  Gabriele Dröge,et al.  The Global Genome Biodiversity Network (GGBN) Data Portal , 2013, Nucleic Acids Res..

[64]  Wisdom M. Dlamini,et al.  A data mining approach to predictive vegetation mapping using probabilistic graphical models , 2011, Ecol. Informatics.

[65]  Mark Tranmer Animal social networks Jens Krause Richard James , 2015, Animal Behaviour.

[66]  Jeff Weber,et al.  Workflow Management in Condor , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[67]  Stephen Abrams,et al.  DMPTool 2: Expanding Functionality for Better Data Management Planning , 2014, Int. J. Digit. Curation.

[68]  Christina M. Bergey,et al.  The use of museum specimens with high-throughput DNA sequencers. , 2015, Journal of human evolution.

[69]  Hervé Goëau,et al.  Automated Identification of Herbarium Specimens at Different Taxonomic Levels , 2018, Multimedia Tools and Applications for Environmental & Biodiversity Informatics.

[70]  Luiz M. R. Gadelha,et al.  Exploring Reproducibility and FAIR Principles in Data Science Using Ecological Niche Modeling as a Case Study , 2019, ER Workshops.

[71]  P. Soltis Digitization of herbaria enables novel research. , 2017, American journal of botany.

[72]  Youhua Chen Conservation biogeography of the snake family Colubridae of China , 2009 .

[73]  David Koop,et al.  VisTrails SAHM: visualization and workflow management for species habitat modeling , 2013 .

[74]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[75]  Mark A. Burgman,et al.  Scientific Foundations for an IUCN Red List of Ecosystems , 2013, PloS one.

[76]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[77]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[78]  A. Peterson,et al.  No silver bullets in correlative ecological niche modelling: insights from testing among many potential algorithms for niche estimation , 2015 .

[79]  Cristina Boeres,et al.  EasyGrid: towards a framework for the automatic Grid enabling of legacy MPI applications , 2004, Concurr. Pract. Exp..

[80]  Carsten F. Dormann,et al.  Ecological networks - foodwebs and beyond , 2009 .

[81]  Siang Thye Hang,et al.  Plant Identification: Experts vs. Machines in the Era of Deep Learning - Deep Learning Techniques Challenge Flora Experts , 2018, Multimedia Tools and Applications for Environmental & Biodiversity Informatics.

[82]  Tony X. Han,et al.  Deep convolutional neural network based species recognition for wild animal monitoring , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[83]  Paul T. Groth,et al.  The rationale of PROV , 2015, J. Web Semant..

[84]  N. Jürgens,et al.  A complete digitization of German herbaria is possible, sensible and should be started now , 2020 .

[85]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[86]  Renée J. Miller,et al.  Open Data Integration , 2018, Proc. VLDB Endow..

[87]  Shawn Bowers,et al.  The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere , 2006 .

[88]  P. Kirk,et al.  International Code of Nomenclature for algae, fungi, and plants (Melbourne Code) , 2012 .

[89]  Jano Moreira de Souza,et al.  Analysis and visualization of the geographical distribution of atlantic forest bromeliads species , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[90]  Constance A. Rinaldo,et al.  The Biodiversity Heritage Library: sharing biodiversity literature with the world , 2009 .

[91]  Karen I. Stocks,et al.  Information Management Strategies for Deep‐Sea Biology , 2016 .

[92]  L. Sack,et al.  Digital data collection in forest dynamics plots , 2010 .

[93]  CandelaLeonardo,et al.  Species distribution modeling in the cloud , 2016 .

[94]  Ari Karppinen,et al.  Multimedia Tools and Applications for Environmental & Biodiversity Informatics , 2018, Multimedia Systems and Applications.

[95]  Raymond L. Lindeman The trophic-dynamic aspect of ecology , 1942 .

[96]  Anne E. Trefethen,et al.  Cyberinfrastructure for e-Science , 2005, Science.

[97]  Robert A. Morris,et al.  Kurator: A Kepler Package for Data Curation Workflows , 2012, ICCS.

[98]  H. Odum Primary Production in Flowing Waters1 , 1956 .

[99]  Siddeswara Guru,et al.  Development of a cloud-based platform for reproducible science: A case study of an IUCN Red List of Ecosystems Assessment , 2016, Ecol. Informatics.

[100]  Eduardo Dalcin,et al.  SiBBr: Uma Infraestrutura para Coleta, Integração e Análise de Dados sobre a Biodiversidade Brasileira , 2014 .

[101]  Michael J. Lutz,et al.  Undergraduate software engineering , 2014, CACM.

[102]  J. M. Heberling,et al.  iNaturalist as a tool to expand the research value of museum specimens , 2018, Applications in plant sciences.

[103]  Margo I. Seltzer,et al.  A primer on provenance , 2014, CACM.

[104]  Douglas Thain,et al.  An invariant framework for conducting reproducible computational science , 2015, J. Comput. Sci..

[105]  Luiz M. R. Gadelha,et al.  New perspectives on analysing data from biological collections based on social network analytics , 2020, Scientific Reports.

[106]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[107]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[108]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[109]  Eli Dart,et al.  The Modern Research Data Portal: a design pattern for networked, data-intensive science , 2018, PeerJ Comput. Sci..

[110]  David Koop,et al.  Data Management Challenges in Species Distribution Modeling , 2013, IEEE Data Eng. Bull..

[111]  David Abramson,et al.  A Computational Pipeline for the IUCN Risk Assessment for Meso-American Reef Ecosystem , 2017, 2017 IEEE 13th International Conference on e-Science (e-Science).

[112]  John Wieczorek,et al.  Connecting data and expertise: a new alliance for biodiversity knowledge , 2019, Biodiversity data journal.

[113]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[114]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[115]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[116]  Paul T. Groth,et al.  Provenance-based validation of e-science experiments , 2005, J. Web Semant..

[117]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[118]  Andreas Wilke,et al.  The MG-RAST metagenomics database and portal in 2015 , 2015, Nucleic Acids Res..

[119]  Ian Foster,et al.  Parsl: Pervasive Parallel Programming in Python , 2019, HPDC.

[120]  Robert P. Anderson,et al.  Environmental filters reduce the effects of sampling bias and improve predictions of ecological niche models , 2014 .

[121]  A. Neves,et al.  New Brazilian Floristic List Highlights Conservation Challenges , 2018 .

[122]  Renzo Kottmann,et al.  RCN4GSC Workshop Report: Managing Data at the Interface of Biodiversity and (Meta)Genomics, March 2011 , 2012, Standards in genomic sciences.

[123]  Gregor Hagedorn,et al.  Discovery and publishing of primary biodiversity data associated with multimedia resources: The Audubon Core strategies and approaches , 2013 .

[124]  N. Pettorelli,et al.  Framing the concept of satellite remote sensing essential biodiversity variables: challenges and future directions , 2016 .

[125]  Eduardo Siegle,et al.  Perspectives on the Great Amazon Reef: Extension, Biodiversity, and Threats , 2018, Front. Mar. Sci..

[126]  Thijs J. G. Ettema,et al.  Asgard archaea illuminate the origin of eukaryotic cellular complexity , 2017, Nature.

[127]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[128]  Timothy J. S. Whitfeld,et al.  Widespread sampling biases in herbaria revealed from large-scale digitization , 2017, bioRxiv.

[129]  Jesús Francisco Vargas-Bonilla,et al.  Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks , 2016, Ecol. Informatics.

[130]  Eric H Lyons,et al.  The iPlant Collaborative , 2012 .

[131]  J. Wesley Barnes,et al.  ConsNet: new software for the selection of conservation area networks with spatial and multi‐criteria analyses , 2009 .

[132]  Sofia C. Olhede,et al.  A method to detect subcommunities from multivariate spatial associations , 2014 .

[133]  Diana Rizzolio Text of the Convention , 2008 .

[134]  Robin Freeman,et al.  Emerging Network-Based Tools in Movement Ecology. , 2016, Trends in ecology & evolution.

[135]  A. Peterson,et al.  Using Ecological‐Niche Modeling to Predict Barred Owl Invasions with Implications for Spotted Owl Conservation , 2003 .

[136]  Xin Zhou,et al.  The Global Genome Biodiversity Network (GGBN) Data Standard specification , 2016, Database J. Biol. Databases Curation.

[137]  Luiz M. R. Gadelha,et al.  Baseline Assessment of Mesophotic Reefs of the Vitória-Trindade Seamount Chain Based on Water Quality, Microbial Diversity, Benthic Cover and Fish Biomass Data , 2015, PloS one.

[138]  A. Peterson,et al.  The crucial role of the accessible area in ecological niche modeling and species distribution modeling , 2011 .

[139]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[140]  Miguel B. Araújo,et al.  Using species co-occurrence networks to assess the impacts of climate change , 2011 .

[141]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[142]  Margaret Kosmala,et al.  Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning , 2017, Proceedings of the National Academy of Sciences.

[143]  Zahid Anwar,et al.  Data mining techniques and applications — A decade review , 2017, 2017 23rd International Conference on Automation and Computing (ICAC).

[144]  Quentin Groom,et al.  Herbarium specimens reveal the exchange network of British and Irish botanists, 1856–1932 , 2014 .

[145]  A. Guisan,et al.  An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data , 2004 .

[146]  Matthew E. Aiello-Lammens,et al.  Wallace: A flexible platform for reproducible modeling of species niches and distributions built for community expansion , 2017 .

[147]  J. Drexler,et al.  Evidence for multiple sylvatic transmission cycles during the 2016-2017 yellow fever virus outbreak, Brazil. , 2018, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[148]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[149]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[150]  Jano Moreira de Souza,et al.  Applying data mining techniques for spatial distribution analysis of plant species co-occurrences , 2016, Expert Syst. Appl..

[151]  Eduardo Couto Dalcin,et al.  Data quality concepts and techniques applied to taxonomic databases , 2005 .

[152]  Ilene Karsch-Mizrachi,et al.  The NCBI BioCollections Database , 2018, Database J. Biol. Databases Curation.

[153]  P. Hebert,et al.  Barcode of life. , 2008, Scientific American.

[154]  Garth N. Wells,et al.  Containers for Portable, Productive, and Performant Scientific Computing , 2016, Computing in Science & Engineering.

[155]  Aakrosh Ratan,et al.  Galaxy tools to study genome diversity , 2013, GigaScience.

[156]  Edward Baker,et al.  Scratchpads 2.0: a Virtual Research Environment supporting scholarly collaboration, communication and data publication in biodiversity science , 2011, ZooKeys.

[157]  M. Fladeland,et al.  Remote sensing for biodiversity science and conservation , 2003 .

[158]  Matthew J. Turk,et al.  Computing Environments for Reproducibility: Capturing the "Whole Tale" , 2018, Future Gener. Comput. Syst..

[159]  Rafael Pino-Mejías,et al.  Predicting the potential habitat of oaks with data mining models and the R system , 2010, Environ. Model. Softw..

[160]  Douglas Thain,et al.  Reproducibility in Scientific Computing , 2018, ACM Comput. Surv..

[161]  A. Townsend Peterson,et al.  The Importance of Biodiversity E-infrastructures for Megadiverse Countries , 2015, PLoS biology.

[162]  Edmund Hart,et al.  Towards a more reproducible ecology , 2016 .

[163]  Rodolfo Paranhos,et al.  Abrolhos Bank Reef Health Evaluated by Means of Water Quality, Microbial Diversity, Benthic Cover, and Fish Biomass Data , 2012, PloS one.

[164]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[165]  Matthew B. Jones,et al.  Managing heterogeneous ecological data using Morpho , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[166]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[167]  Steve Kelling,et al.  Participatory design of DataONE - Enabling cyberinfrastructure for the biological and environmental sciences , 2012, Ecol. Informatics.

[168]  Bas E. Dutilh,et al.  FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares , 2014, PeerJ.

[169]  R. Forzza,et al.  Herbarium collection of the Rio de Janeiro Botanical Garden (RB), Brazil , 2018, Biodiversity data journal.

[170]  Ashley Shade,et al.  Computing Workflows for Biologists: A Roadmap , 2015, PLoS biology.

[171]  Jens Kattge,et al.  Biodiversity data integration—the significance of data resolution and domain , 2019, PLoS biology.

[172]  Matthew B Jones,et al.  Ecoinformatics: supporting ecology as a data-intensive science. , 2012, Trends in ecology & evolution.

[173]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[174]  Marta Mattoso,et al.  Provenance and Annotation of Data and Processes , 2016, Lecture Notes in Computer Science.

[175]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[176]  Charles Troupin,et al.  Bio‐ORACLE: a global environmental dataset for marine species distribution modelling , 2012 .

[177]  Daniel Sabatier,et al.  Species Distribution Modelling: Contrasting presence-only models with plot abundance data , 2018, Scientific Reports.

[178]  A. Townsend Peterson,et al.  Ecological Niche Modeling Using the Kepler Workflow System , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[179]  Gerald L. Kooyman,et al.  An Emperor Penguin Population Estimate: The First Global, Synoptic Survey of a Species from Space , 2012, PloS one.

[180]  Rommie E. Amaro,et al.  Exascale Computing: A New Dawn for Computational Biology , 2018, Computing in Science & Engineering.

[181]  Jitendra Kumar,et al.  Cluster Analysis-Based Approaches for Geospatiotemporal Data Mining of Massive Data Sets for Identification of Forest Threats , 2011, ICCS.

[182]  Matthew B. Jones,et al.  Metacat: a schema-independent XML database system , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[183]  Tim Sutton,et al.  How Global Is the Global Biodiversity Information Facility? , 2007, PloS one.

[184]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[185]  R. Henrik Nilsson,et al.  Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi , 2014, Database J. Biol. Databases Curation.

[186]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[187]  Lee Belbin,et al.  Towards a national bio-environmental data facility: experiences from the Atlas of Living Australia , 2016, Int. J. Geogr. Inf. Sci..

[188]  Marko Debeljak,et al.  Modelling forest growing stock from inventory data: A data mining approach , 2014 .

[189]  Michael Nee,et al.  An integrated assessment of the vascular plant species of the Americas , 2017, Science.

[190]  Graziano Pesole,et al.  The BioVel Project: Robust phylogenetic workflows running on the GRID , 2012 .

[191]  Peter Brewer,et al.  openModeller: a generic approach to species’ potential distribution modelling , 2011, GeoInformatica.

[192]  M. White,et al.  Measuring and comparing the accuracy of species distribution models with presence–absence data , 2011 .

[193]  Omri Allouche,et al.  Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS) , 2006 .

[194]  Renée J. Miller,et al.  Table Union Search on Open Data , 2018, Proc. VLDB Endow..

[195]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[196]  O. Phillips,et al.  Extinction risk from climate change , 2004, Nature.

[197]  Vincent S. Smith,et al.  Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects , 2017, Database J. Biol. Databases Curation.

[198]  Anna Lysyanskaya,et al.  How to keep secrets safe. , 2008, Scientific American.

[199]  Verena Kantere,et al.  Managing scientific data , 2010, Commun. ACM.

[200]  Alex Hardisty,et al.  UvA-DARE ( Digital Academic Repository ) A decadal view of biodiversity informatics : challenges and priorities , 2013 .

[201]  Clara Baringo Fonseca,et al.  SiBBr: Envisioning the spatial distribution of Brazilian biodiversity records , 2017 .

[202]  David E. Golan,et al.  Protein therapeutics: a summary and pharmacological classification , 2008, Nature Reviews Drug Discovery.

[203]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[204]  Marinez Ferreira de Siqueira,et al.  Consequences of global climate change for geographic distributions of cerrado tree species , 2003 .

[205]  Zhenyuan Lu,et al.  The taxonomic name resolution service: an online tool for automated standardization of plant names , 2013, BMC Bioinformatics.

[206]  M. Luoto,et al.  Biotic interactions improve prediction of boreal bird distributions at macro‐scales , 2007 .

[207]  Eli Dart,et al.  The Modern Research Data Portal: a design pattern for networked, data-intensive science , 2018, PeerJ Comput. Sci..

[208]  Walter G. Berendsohn,et al.  A taxonomic information model for botanical databases: the IOPI Model , 1997 .

[209]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[210]  T. Groen,et al.  Spatial autocorrelation in predictors reduces the impact of positional uncertainty in occurrence data on species distribution modelling , 2011 .

[211]  Tony X. Han,et al.  Ensemble Video Object Cut in Highly Dynamic Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[212]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[213]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Comparing machine learning classifiers in potential distribution modelling , 2011, Expert Syst. Appl..

[214]  T. Rangel,et al.  Partitioning and mapping uncertainties in ensembles of forecasts of species turnover under climate change , 2009 .

[215]  Ben Collen,et al.  Global effects of land use on local terrestrial biodiversity , 2015, Nature.

[216]  J. Edwards Research and Societal Benefits of the Global Biodiversity Information Facility , 2004 .

[217]  Laura Eme,et al.  Archaea and the origin of eukaryotes , 2017, Nature Reviews Microbiology.

[218]  WESLEY M. HOCHACHKA,et al.  Data-Mining Discovery of Pattern and Process in Ecological Systems , 2007 .

[219]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[220]  A. Townsend Peterson,et al.  Essential biodiversity variables are not global , 2018, Biodiversity and Conservation.

[221]  C. A. Howell,et al.  Niches, models, and climate change: Assessing the assumptions and uncertainties , 2009, Proceedings of the National Academy of Sciences.

[222]  Louisa Flintoft,et al.  A barcode for life? , 2004, Nature Reviews Genetics.

[223]  Marta Mattoso,et al.  A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.

[224]  V. Stodden,et al.  Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals , 2013, PloS one.

[225]  Fabiano L. Thompson,et al.  Metagenomic Analysis of Healthy and White Plague-Affected Mussismilia braziliensis Corals , 2013, Microbial Ecology.

[226]  B. S. Manjunath,et al.  The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..

[227]  Ruben Vicente-Saez,et al.  Open Science now: A systematic literature review for an integrated definition , 2018, Journal of Business Research.

[228]  W. G. Berendsohn,et al.  Biodiversity information platforms: From standards to interoperability , 2011, ZooKeys.

[229]  R. C. Forzza,et al.  Jabot - Sistema de Gerenciamento de Coleções Botânicas: a experiência de uma década de desenvolvimento e avanços , 2017 .

[230]  Joseph A Cook,et al.  The next generation of natural history collections , 2018, PLoS biology.

[231]  Graziano Pesole,et al.  UvA-DARE ( Digital Academic Repository ) BioVeL : a virtual laboratory for data analysis and modelling in biodiversity science and ecology , 2016 .

[232]  Robert A. Boria,et al.  Spatial filtering to reduce sampling bias can improve the performance of ecological niche models , 2014 .

[233]  Pasquale Pagano,et al.  Species distribution modeling in the cloud , 2016, Concurr. Comput. Pract. Exp..

[234]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Real Databases: A Report on the IJCAI-89 Workshop , 1991, AI Mag..

[235]  P. Bonnet,et al.  Going deeper in the automated identification of Herbarium specimens , 2017, BMC Evolutionary Biology.

[236]  G. Daily,et al.  Biodiversity loss and its impact on humanity , 2012, Nature.

[237]  Zhi Zhang,et al.  Visual Informatics Tools for Supporting Large-Scale Collaborative Wildlife Monitoring with Citizen Scientists , 2016, IEEE Circuits and Systems Magazine.

[238]  J. Calabrese,et al.  Stacking species distribution models and adjusting bias by linking them to macroecological models , 2014 .

[239]  D. Tautz,et al.  A plea for DNA taxonomy , 2003 .

[240]  Ulf Leser,et al.  Similarity Search for Scientific Workflows , 2014, Proc. VLDB Endow..

[241]  David J. Gavaghan,et al.  The zoon r package for reproducible and shareable species distribution modelling , 2017 .

[242]  Jennifer Preece,et al.  Understanding Data Providers in a Global Scientific Data Hub , 2015, CSCW Companion.

[243]  R. Ostfeld,et al.  Effects of environmental change on zoonotic disease risk: an ecological primer. , 2014, Trends in parasitology.

[244]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[245]  Marta Mattoso,et al.  Towards supporting the life cycle of large scale scientific experiments , 2010, Int. J. Bus. Process. Integr. Manag..

[246]  K. Schmidt Conceptual Framework for , 2002 .

[247]  Javier Otegui,et al.  The geospatial data quality REST API for primary biodiversity data , 2016, Bioinform..

[248]  Heather Holden,et al.  Hyperspectral identification of coral reef features , 1999 .

[249]  Jitendra Kumar,et al.  Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets , 2011, ICCS.

[250]  A. Townsend Peterson,et al.  kuenm: an R package for detailed development of ecological niche models using Maxent , 2019, PeerJ.

[251]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[252]  Helio J. C. Barbosa,et al.  SISS-Geo: Leveraging Citizen Science to Monitor Wildlife Health Risks in Brazil , 2018, Journal of Healthcare Informatics Research.

[253]  Miguel B. Araújo,et al.  Selecting areas for species persistence using occurrence data , 2000 .

[254]  Alberto Apostolico,et al.  Global Biodiversity Informatics Outlook: Delivering biodiversity knowledge in the information age , 2013 .

[255]  Robert P. Anderson,et al.  Evaluating predictive models of species’ distributions: criteria for selecting optimal models , 2003 .

[256]  J. Bascompte Networks in ecology , 2007 .

[257]  Robert Pergl,et al.  "Data Stewardship Wizard": A Tool Bringing Together Researchers, Data Stewards, and Data Experts around Data Management Planning , 2019, Data Sci. J..

[258]  Jorge Soberón Niche and area of distribution modeling: a population ecology perspective , 2010 .

[259]  J. Bascompte,et al.  Ecological networks : beyond food webs Ecological networks – beyond food webs , 2008 .

[260]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[261]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[262]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[263]  Anton Güntsch,et al.  The Biodiversity Informatics Landscape: Elements, Connections and Opportunities , 2017 .

[264]  Jarrett E. K. Byrnes,et al.  A global synthesis reveals biodiversity loss as a major driver of ecosystem change , 2012, Nature.

[265]  Pasquale Pagano,et al.  Supporting Biodiversity Studies by the EUBrazilOpenBio Hybrid Data Infrastructure , 2013 .

[266]  Tony Rees,et al.  Taxamatch, an Algorithm for Near (‘Fuzzy’) Matching of Scientific Names in Taxonomic Databases , 2014, PloS one.

[267]  Lisa Drew,et al.  Are We Losing the Science of Taxonomy? , 2011 .

[268]  Christopher R. Stephens,et al.  Using Biotic Interaction Networks for Prediction in Biodiversity and Emerging Diseases , 2008, PloS one.

[269]  Santiago José Elías Velazco,et al.  ENMTML: An R package for a straightforward construction of complex ecological niche models , 2020, Environ. Model. Softw..

[270]  T. Dawson,et al.  Selecting thresholds of occurrence in the prediction of species distributions , 2005 .

[271]  D. Roberts,et al.  Hyperspectral discrimination of tropical rain forest tree species at leaf to crown scales , 2005 .

[272]  N. Pettorelli,et al.  Essential Biodiversity Variables , 2013, Science.

[273]  Marie-Stéphanie Samain,et al.  Data Mining for Global Trends in Mountain Biodiversity , 2011 .

[274]  Bas E. Dutilh,et al.  SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data , 2015, Bioinform..

[275]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[276]  Ian M. Mitchell,et al.  Best Practices for Scientific Computing , 2012, PLoS biology.

[277]  M. Araújo,et al.  Uses and misuses of bioclimatic envelope modeling. , 2012, Ecology.

[278]  Anne Bowser,et al.  The Bari Manifesto: An interoperability framework for essential biodiversity variables , 2019, Ecol. Informatics.

[279]  Indra Neil Sarkar,et al.  Taxongrab: Extracting Taxonomic Names from Text , 2005 .

[280]  José Laurindo Campos dos Santos,et al.  Biodiversity and Integrated Environmental Monitoring , 2013 .

[281]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[282]  Louise McRae,et al.  Global biodiversity monitoring: From data sources to Essential Biodiversity Variables , 2017 .

[283]  Luiz M. R. Gadelha,et al.  Model-R: A Framework for Scalable and Reproducible Ecological Niche Modeling , 2017, CARLA.

[284]  Kristin Vanderbilt,et al.  Long term ecological research and information management , 2011, Ecol. Informatics.

[285]  Antonio Mauro Saraiva,et al.  A conceptual framework for quality assessment and management of biodiversity data , 2017, PloS one.

[286]  Robert Cubey,et al.  Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach , 2012, ZooKeys.

[287]  Barry Smith,et al.  Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies , 2014, PloS one.

[288]  Norman F Johnson,et al.  Biodiversity informatics. , 2007, Annual review of entomology.

[289]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.

[290]  Daniel S. Park,et al.  A checklist for maximizing reproducibility of ecological niche models , 2019, Nature Ecology & Evolution.