A Survey of e-Biodiversity: Concepts, Practices, and Challenges

The unprecedented size of the human population, along with its associated economic activities, have an ever increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide them. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision-makers in ways that they can effectively use them. The development and deployment of mechanisms to produce these indicators depend on having access to trustworthy data from field surveys and automated sensors, biological collections, molecular data, and historic academic literature. The transformation of this raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques used to manage and analyze this data comprise an area often called biodiversity informatics (or e-Biodiversity). Biodiversity data follows a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.

[1]  Renaud Fortuner,et al.  Advances in Computer Methods for Systematic Biology: Artificial Intelligence, Databases, Computer Vision , 1993 .

[2]  Ben Collen,et al.  Global effects of land use on local terrestrial biodiversity , 2015, Nature.

[3]  Seth Kaufman,et al.  MorphoBank: phylophenomics in the “cloud” , 2011, Cladistics : the international journal of the Willi Hennig Society.

[4]  J. Edwards Research and Societal Benefits of the Global Biodiversity Information Facility , 2004 .

[5]  Frank A. Bisby,et al.  Taxonomic Analysis in Biology: Computers, Models, and Databases , 1985 .

[6]  F. Perring,et al.  Data-Processing for the Atlas of the British Flora , 1963 .

[7]  Laura Eme,et al.  Archaea and the origin of eukaryotes , 2017, Nature Reviews Microbiology.

[8]  Mark A. Burgman,et al.  Scientific Foundations for an IUCN Red List of Ecosystems , 2013, PloS one.

[9]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[10]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[11]  Ulf Leser,et al.  Effective and efficient similarity search in scientific workflow repositories , 2016, Future Gener. Comput. Syst..

[12]  WESLEY M. HOCHACHKA,et al.  Data-Mining Discovery of Pattern and Process in Ecological Systems , 2007 .

[13]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[14]  Jesse Cleary,et al.  Data integration for conservation: Leveraging multiple data types to advance ecological assessments and habitat modeling for marine megavertebrates using OBIS-SEAMAP , 2014, Ecol. Informatics.

[15]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[16]  A. Townsend Peterson,et al.  Essential biodiversity variables are not global , 2018, Biodiversity and Conservation.

[17]  C. A. Howell,et al.  Niches, models, and climate change: Assessing the assumptions and uncertainties , 2009, Proceedings of the National Academy of Sciences.

[18]  A. Peterson,et al.  Biodiversity informatics: managing and applying primary biodiversity data. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[19]  Bertram Ludäscher,et al.  Kurator: Tools for Improving Fitness for Use of Biodiversity Data. , 2018 .

[20]  Z. Huaman,et al.  Assessing the Geographic Representativeness of Genebank Collections: the Case of Bolivian Wild Potatoes , 2000, Conservation biology : the journal of the Society for Conservation Biology.

[21]  Marta Mattoso,et al.  BaMBa: towards the integrated management of Brazilian marine environmental data , 2015, Database J. Biol. Databases Curation.

[22]  Renée J. Miller,et al.  Open Data Integration , 2018, Proc. VLDB Endow..

[23]  Shawn Bowers,et al.  The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere , 2006 .

[24]  Harriet Meadow Krauss The information system design for the Flora North America Program , 2008, Brittonia.

[25]  J. Bascompte Networks in ecology , 2007 .

[26]  F. Bisby The quiet revolution: biodiversity informatics and the internet. , 2000, Science.

[27]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[28]  Marinez Ferreira de Siqueira,et al.  Consequences of global climate change for geographic distributions of cerrado tree species , 2003 .

[29]  Eduardo Siegle,et al.  Perspectives on the Great Amazon Reef: Extension, Biodiversity, and Threats , 2018, Front. Mar. Sci..

[30]  Thijs J. G. Ettema,et al.  Asgard archaea illuminate the origin of eukaryotic cellular complexity , 2017, Nature.

[31]  Constance A. Rinaldo,et al.  The Biodiversity Heritage Library: sharing biodiversity literature with the world , 2009 .

[32]  Matthew E. Aiello-Lammens,et al.  spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models , 2015 .

[33]  Karen I. Stocks,et al.  Information Management Strategies for Deep‐Sea Biology , 2016 .

[34]  Stefan Marr,et al.  Partitioned Global Address Space Languages , 2015, ACM Comput. Surv..

[35]  P. Bonnet,et al.  Going deeper in the automated identification of Herbarium specimens , 2017, BMC Evolutionary Biology.

[36]  G. Daily,et al.  Biodiversity loss and its impact on humanity , 2012, Nature.

[37]  Matthew J. Turk,et al.  Computing Environments for Reproducibility: Capturing the "Whole Tale" , 2018, Future Gener. Comput. Syst..

[38]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[39]  Rafael Pino-Mejías,et al.  Predicting the potential habitat of oaks with data mining models and the R system , 2010, Environ. Model. Softw..

[40]  A. Townsend Peterson,et al.  The Importance of Biodiversity E-infrastructures for Megadiverse Countries , 2015, PLoS biology.

[41]  R. J. White,et al.  Handling the taxonomic structure of biological data , 1992 .

[42]  Zhenyuan Lu,et al.  The taxonomic name resolution service: an online tool for automated standardization of plant names , 2013, BMC Bioinformatics.

[43]  M. Luoto,et al.  Biotic interactions improve prediction of boreal bird distributions at macro‐scales , 2007 .

[44]  Zhi Zhang,et al.  Visual Informatics Tools for Supporting Large-Scale Collaborative Wildlife Monitoring with Citizen Scientists , 2016, IEEE Circuits and Systems Magazine.

[45]  Renzo Kottmann,et al.  Meeting Report: Hackathon-Workshop on Darwin Core and MIxS Standards Alignment (February 2012) , 2012, Standards in genomic sciences.

[46]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[47]  P C Silva Machine data processing and plant taxonomy. , 1966, Science.

[48]  Jiang Zhu,et al.  Computational tools for epitope vaccine design and evaluation. , 2015, Current opinion in virology.

[49]  Walter G. Berendsohn,et al.  A taxonomic information model for botanical databases: the IOPI Model , 1997 .

[50]  Jorge Soberón Niche and area of distribution modeling: a population ecology perspective , 2010 .

[51]  Barbara M. Thiers,et al.  Taxonomic analysis in biology. Computers, models, and databases , 1985, Brittonia.

[52]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[53]  J. Bascompte,et al.  Ecological networks : beyond food webs Ecological networks – beyond food webs , 2008 .

[54]  M. Rounsevell,et al.  Exposure of European biodiversity to changes in human-induced pressures , 2008 .

[55]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[56]  P. Kirk,et al.  International Code of Nomenclature for algae, fungi, and plants (Melbourne Code) , 2012 .

[57]  Raymond L. Lindeman The trophic-dynamic aspect of ecology , 1942 .

[58]  Robert A. Morris,et al.  Kurator: A Kepler Package for Data Curation Workflows , 2012, ICCS.

[59]  Dora Ann Lange Canhos,et al.  New Brazilian Floristic List Highlights Conservation Challenges , 2012 .

[60]  T. Groen,et al.  Spatial autocorrelation in predictors reduces the impact of positional uncertainty in occurrence data on species distribution modelling , 2011 .

[61]  J. Calabrese,et al.  Stacking species distribution models and adjusting bias by linking them to macroecological models , 2014 .

[62]  Edward C Holmes,et al.  Evolutionary history and phylogeography of human viruses. , 2008, Annual review of microbiology.

[63]  H. Odum Primary Production in Flowing Waters1 , 1956 .

[64]  Siddeswara Guru,et al.  Development of a cloud-based platform for reproducible science: A case study of an IUCN Red List of Ecosystems Assessment , 2016, Ecol. Informatics.

[65]  Eduardo Dalcin,et al.  SiBBr: Uma Infraestrutura para Coleta, Integração e Análise de Dados sobre a Biodiversidade Brasileira , 2014 .

[66]  R. Pearson,et al.  Predicting species distributions from small numbers of occurrence records: A test case using cryptic geckos in Madagascar , 2006 .

[67]  Stuart R. Borrett,et al.  The rise of Network Ecology: Maps of the topic diversity and scientific collaboration , 2013, 1311.1785.

[68]  Heather A. Piwowar,et al.  Data reuse and the open data citation advantage , 2013, PeerJ.

[69]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[70]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[71]  Stephen Abrams,et al.  DMPTool 2: Expanding Functionality for Better Data Management Planning , 2014, Int. J. Digit. Curation.

[72]  Christina M. Bergey,et al.  The use of museum specimens with high-throughput DNA sequencers. , 2015, Journal of human evolution.

[73]  Hervé Goëau,et al.  Automated Identification of Herbarium Specimens at Different Taxonomic Levels , 2018, Multimedia Tools and Applications for Environmental & Biodiversity Informatics.

[74]  Frank A. Bisby,et al.  Designs for a Global Plant Species Information System , 1994 .

[75]  Margo I. Seltzer,et al.  A primer on provenance , 2014, CACM.

[76]  Douglas Thain,et al.  An invariant framework for conducting reproducible computational science , 2015, J. Comput. Sci..

[77]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[78]  Bas E. Dutilh,et al.  SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data , 2015, Bioinform..

[79]  David R. B. Stockwell,et al.  The GARP modelling system: problems and solutions to automated spatial prediction , 1999, Int. J. Geogr. Inf. Sci..

[80]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[81]  Ian M. Mitchell,et al.  Best Practices for Scientific Computing , 2012, PLoS biology.

[82]  M. Araújo,et al.  Uses and misuses of bioclimatic envelope modeling. , 2012, Ecology.

[83]  Marta Mattoso,et al.  A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.

[84]  Lee Belbin,et al.  Towards a national bio-environmental data facility: experiences from the Atlas of Living Australia , 2016, Int. J. Geogr. Inf. Sci..

[85]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[86]  Peter Brewer,et al.  openModeller: a generic approach to species’ potential distribution modelling , 2011, GeoInformatica.

[87]  E Pennisi,et al.  Diversity digitized. , 2000, Science.

[88]  Fabiano L. Thompson,et al.  Metagenomic Analysis of Healthy and White Plague-Affected Mussismilia braziliensis Corals , 2013, Microbial Ecology.

[89]  Michael Hofreiter,et al.  New life for ancient DNA. , 2012, Scientific American.

[90]  O. Phillips,et al.  Extinction risk from climate change , 2004, Nature.

[91]  Helio J. C. Barbosa,et al.  SISS-Geo: Leveraging Citizen Science to Monitor Wildlife Health Risks in Brazil , 2019, J. Heal. Informatics Res..

[92]  B. S. Manjunath,et al.  The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..

[93]  Yaxing Wei,et al.  DataONE: A Data Federation with Provenance Support , 2016, IPAW.

[94]  Verena Kantere,et al.  Managing scientific data , 2010, Commun. ACM.

[95]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[96]  Youhua Chen Conservation biogeography of the snake family Colubridae of China , 2009 .

[97]  Tony X. Han,et al.  Ensemble Video Object Cut in Highly Dynamic Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[98]  Giacinto Donvito,et al.  The Biovel Project: Robust phylogenetic workflows running on the GRID , 2012 .

[99]  Joseph A Cook,et al.  The next generation of natural history collections , 2018, PLoS biology.

[100]  T. Rangel,et al.  Partitioning and mapping uncertainties in ensembles of forecasts of species turnover under climate change , 2009 .

[101]  T. Dawson,et al.  Model‐based uncertainty in species range prediction , 2006 .

[102]  Graziano Pesole,et al.  UvA-DARE ( Digital Academic Repository ) BioVeL : a virtual laboratory for data analysis and modelling in biodiversity science and ecology , 2016 .

[103]  Robert A. Boria,et al.  Spatial filtering to reduce sampling bias can improve the performance of ecological niche models , 2014 .

[104]  Pasquale Pagano,et al.  Species distribution modeling in the cloud , 2016, Concurr. Comput. Pract. Exp..

[105]  Rodolfo Paranhos,et al.  Abrolhos Bank Reef Health Evaluated by Means of Water Quality, Microbial Diversity, Benthic Cover, and Fish Biomass Data , 2012, PloS one.

[106]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[107]  Matthew B. Jones,et al.  Managing heterogeneous ecological data using Morpho , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[108]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[109]  Steve Kelling,et al.  Participatory design of DataONE - Enabling cyberinfrastructure for the biological and environmental sciences , 2012, Ecol. Informatics.

[110]  A. Guisan,et al.  An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data , 2004 .

[111]  J. Drexler,et al.  Evidence for multiple sylvatic transmission cycles during the 2016-2017 yellow fever virus outbreak, Brazil. , 2018, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[112]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[113]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[114]  Marko Debeljak,et al.  Modelling forest growing stock from inventory data: A data mining approach , 2014 .

[115]  Michael Nee,et al.  An integrated assessment of the vascular plant species of the Americas , 2017, Science.

[116]  M. White,et al.  Measuring and comparing the accuracy of species distribution models with presence–absence data , 2011 .

[117]  Omri Allouche,et al.  Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS) , 2006 .

[118]  Renée J. Miller,et al.  Table Union Search on Open Data , 2018, Proc. VLDB Endow..

[119]  Thomas G. Dietterich,et al.  The eBird enterprise: An integrated approach to development and application of citizen science , 2014 .

[120]  David Koop,et al.  Data Management Challenges in Species Distribution Modeling , 2013, IEEE Data Eng. Bull..

[121]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[122]  David Abramson,et al.  A Computational Pipeline for the IUCN Risk Assessment for Meso-American Reef Ecosystem , 2017, 2017 IEEE 13th International Conference on e-Science (e-Science).

[123]  Donald F. Squires,et al.  Data Processing and Museum Collections: A Problem for the Present , 1966 .

[124]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[125]  Eli Dart,et al.  The Modern Research Data Portal: a design pattern for networked, data-intensive science , 2018, PeerJ Comput. Sci..

[126]  Elisa Thébault,et al.  Identifying compartments in presence–absence matrices and bipartite networks: insights into modularity measures , 2013 .

[127]  F. Grassle The Ocean Biogeographic Information System (OBIS): An On-line, Worldwide Atlas for Accessing, Modeling and Mapping Marine Biological Data in a Multidimensional Geographic Context , 2000 .

[128]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[129]  Jesús Francisco Vargas-Bonilla,et al.  Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks , 2016, Ecol. Informatics.

[130]  R. D. MacDonald,et al.  Electronic Data Processing Methods for Botanical Garden and Arboretum Records , 1966 .

[131]  T. Dawson,et al.  Selecting thresholds of occurrence in the prediction of species distributions , 2005 .

[132]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[133]  D. Roberts,et al.  Hyperspectral discrimination of tropical rain forest tree species at leaf to crown scales , 2005 .

[134]  Olaf Conrad,et al.  Climatologies at high resolution for the earth’s land surface areas , 2016, Scientific Data.

[135]  Anton Güntsch,et al.  The Biodiversity Informatics Landscape: Elements, Connections and Opportunities , 2017 .

[136]  Walter G. Berendsohn,et al.  The concept of "potential taxa" in databases , 1995 .

[137]  J. Wesley Barnes,et al.  ConsNet: new software for the selection of conservation area networks with spatial and multi‐criteria analyses , 2009 .

[138]  Sofia C. Olhede,et al.  A method to detect subcommunities from multivariate spatial associations , 2014 .

[139]  Jarrett E. K. Byrnes,et al.  A global synthesis reveals biodiversity loss as a major driver of ecosystem change , 2012, Nature.

[140]  Pasquale Pagano,et al.  Supporting Biodiversity Studies by the EUBrazilOpenBio Hybrid Data Infrastructure , 2013 .

[141]  Tony Rees,et al.  Taxamatch, an Algorithm for Near (‘Fuzzy’) Matching of Scientific Names in Taxonomic Databases , 2014, PloS one.

[142]  Ilene Karsch-Mizrachi,et al.  The NCBI BioCollections Database. , 2019, Database : the journal of biological databases and curation.

[143]  Lisa Drew,et al.  Are We Losing the Science of Taxonomy? , 2011 .

[144]  Taylor H. Ricketts,et al.  The Convention on Biological Diversity's 2010 Target , 2005, Science.

[145]  Miguel B. Araújo,et al.  Using species co-occurrence networks to assess the impacts of climate change , 2011 .

[146]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[147]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[148]  D. Tautz,et al.  A plea for DNA taxonomy , 2003 .

[149]  Ulf Leser,et al.  Similarity Search for Scientific Workflows , 2014, Proc. VLDB Endow..

[150]  G. van der Velde,et al.  Ecological niches. Linking classical and contemporary approaches , 2008 .

[151]  R. Ostfeld,et al.  Effects of environmental change on zoonotic disease risk: an ecological primer. , 2014, Trends in parasitology.

[152]  Katherine Faust Animal Social Networks , 2014 .

[153]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[154]  Margaret Kosmala,et al.  Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning , 2017, Proceedings of the National Academy of Sciences.

[155]  Marta Mattoso,et al.  Towards supporting the life cycle of large scale scientific experiments , 2010, Int. J. Bus. Process. Integr. Manag..

[156]  R. Guralnick,et al.  Biodiversity informatics: automated approaches for documenting global biodiversity patterns and processes , 2009, Bioinform..

[157]  Zahid Anwar,et al.  Data mining techniques and applications — A decade review , 2017, 2017 23rd International Conference on Automation and Computing (ICAC).

[158]  Dale H. Vitt,et al.  A COMPUTER PROGRAM FOR PRINTING HERBARIUM LABELS , 1977 .

[159]  Javier Otegui,et al.  The geospatial data quality REST API for primary biodiversity data , 2016, Bioinform..

[160]  David B. Lindenmayer,et al.  DYNAMIC SPECIES CO–OCCURRENCE NETWORKS REQUIRE DYNAMIC BIODIVERSITY SURROGATES , 2016 .

[161]  Albert C. Smith Advice to Administrators of Systematic Collections , 1966 .

[162]  Marie-Stéphanie Samain,et al.  Data Mining for Global Trends in Mountain Biodiversity , 2011 .

[163]  A. V. Hall,et al.  A COMPUTER‐BASED SYSTEM FOR FORMING IDENTIFICATION KEYS , 1970 .

[164]  Carlos Peña,et al.  VoSeq: A Voucher and DNA Sequence Web Application , 2012, PloS one.

[165]  Eduardo Couto Dalcin,et al.  Data quality concepts and techniques applied to taxonomic databases , 2005 .

[166]  Clifford M. Wetmore Herbarium Computerization at the University of Minnesota , 1979 .

[167]  Daniel S. Park,et al.  Widespread sampling biases in herbaria revealed from large-scale digitization , 2017 .

[168]  Quentin Groom,et al.  Herbarium specimens reveal the exchange network of British and Irish botanists, 1856–1932 , 2014 .

[169]  A. Townsend Peterson,et al.  Ecological Niche Modeling Using the Kepler Workflow System , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[170]  Douglas Thain,et al.  Reproducibility in Scientific Computing , 2018, ACM Comput. Surv..

[171]  Gregor Hagedorn,et al.  Discovery and publishing of primary biodiversity data associated with multimedia resources: The Audubon Core strategies and approaches , 2013 .

[172]  N. Pettorelli,et al.  Framing the concept of satellite remote sensing essential biodiversity variables: challenges and future directions , 2016 .

[173]  Bas E. Dutilh,et al.  FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares , 2014, PeerJ.

[174]  Gerald L. Kooyman,et al.  An Emperor Penguin Population Estimate: The First Global, Synoptic Survey of a Species from Space , 2012, PloS one.

[175]  Ashley Shade,et al.  Computing Workflows for Biologists: A Roadmap , 2015, PLoS biology.

[176]  Garth N. Wells,et al.  Containers for Portable, Productive, and Performant Scientific Computing , 2016, Computing in Science & Engineering.

[177]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[178]  Theodore J. Crovello,et al.  PROBLEMS IN THE USE OF ELECTRONIC DATA PROCESSING IN BIOLOGICAL COLLECTIONS , 1967 .

[179]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[180]  Robin Freeman,et al.  Emerging Network-Based Tools in Movement Ecology. , 2016, Trends in ecology & evolution.

[181]  A. Peterson,et al.  Using Ecological‐Niche Modeling to Predict Barred Owl Invasions with Implications for Spotted Owl Conservation , 2003 .

[182]  David Koop,et al.  VisTrails SAHM: visualization and workflow management for species habitat modeling , 2013 .

[183]  Robert R. Sokal,et al.  EFFICIENCY IN TAXONOMY , 1966 .

[184]  Xin Zhou,et al.  The Global Genome Biodiversity Network (GGBN) Data Standard specification , 2016, Database J. Biol. Databases Curation.

[185]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[186]  Alberto Jiménez-Valverde,et al.  Limitations of Biodiversity Databases: Case Study on Seed‐Plant Diversity in Tenerife, Canary Islands , 2007, Conservation biology : the journal of the Society for Conservation Biology.

[187]  Luiz M. R. Gadelha,et al.  Baseline Assessment of Mesophotic Reefs of the Vitória-Trindade Seamount Chain Based on Water Quality, Microbial Diversity, Benthic Cover and Fish Biomass Data , 2015, PloS one.

[188]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[189]  R. J. White,et al.  Language for the definition and exchange of biological data sets , 1992 .

[190]  Aakrosh Ratan,et al.  Galaxy tools to study genome diversity , 2013, GigaScience.

[191]  Edward Baker,et al.  Scratchpads 2.0: a Virtual Research Environment supporting scholarly collaboration, communication and data publication in biodiversity science , 2011, ZooKeys.

[192]  M. Fladeland,et al.  Remote sensing for biodiversity science and conservation , 2003 .

[193]  R. E. Beschel,et al.  The automation and standardization of certain herbarium procedures , 1970 .

[194]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[195]  Francisco Pando How species interactions are managed in Plinian Core: Status and questions , 2017 .

[196]  Rommie E. Amaro,et al.  Exascale Computing: A New Dawn for Computational Biology , 2018, Computing in Science & Engineering.

[197]  S. Reddy,et al.  Geographical sampling bias and its implications for conservation priorities in Africa , 2003 .

[198]  E L Yochelson,et al.  Nomenclature in the machine age. , 1966, Systematic Zoology.

[199]  Matthew B. Jones,et al.  Metacat: a schema-independent XML database system , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[200]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[201]  Bette A. Loiselle,et al.  Predicting species distributions from herbarium collections: does climate bias in collection sampling influence model outcomes? , 2007 .

[202]  Tim Sutton,et al.  How Global Is the Global Biodiversity Information Facility? , 2007, PloS one.

[203]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[204]  R. Henrik Nilsson,et al.  Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi , 2014, Database J. Biol. Databases Curation.

[205]  R. Guralnick,et al.  BioGeomancer: Automated Georeferencing to Map the World's Biodiversity Data , 2006, PLoS biology.

[206]  Brendan A. Wintle,et al.  Imperfect detection impacts the performance of species distribution models , 2014 .

[207]  Gabriele Dröge,et al.  The Global Genome Biodiversity Network (GGBN) Data Portal , 2013, Nucleic Acids Res..

[208]  Jeff Weber,et al.  Workflow Management in Condor , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[209]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[210]  David O. Holmes,et al.  Improving precision and recall for Soundex retrieval , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[211]  Javier Otegui,et al.  The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet , 2014, PloS one.

[212]  A. Peterson,et al.  Ecologic niche modeling and differentiation of populations of Triatoma brasiliensis neiva, 1911, the most important Chagas' disease vector in northeastern Brazil (hemiptera, reduviidae, triatominae). , 2002, The American journal of tropical medicine and hygiene.

[213]  Fred W. Glover,et al.  Future paths for integer programming and links to artificial intelligence , 1986, Comput. Oper. Res..

[214]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[215]  M. Bustamante,et al.  Brazil's environmental leadership at risk , 2014, Science.

[216]  Heather Holden,et al.  Hyperspectral identification of coral reef features , 1999 .

[217]  Jitendra Kumar,et al.  Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets , 2011, ICCS.

[218]  Nicholas Chrisman,et al.  THE ERROR COMPONENT IN SPATIAL DATA , 2005 .

[219]  Juliana Freire,et al.  Provenance and the Different Flavors of Reproducibility , 2018, IEEE Data Eng. Bull..

[220]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[221]  Miguel B. Araújo,et al.  Selecting areas for species persistence using occurrence data , 2000 .

[222]  Alberto Apostolico,et al.  Global Biodiversity Informatics Outlook: Delivering biodiversity knowledge in the information age , 2013 .

[223]  Robert P. Anderson,et al.  Evaluating predictive models of species’ distributions: criteria for selecting optimal models , 2003 .

[224]  Lionel Guy,et al.  Deep mitochondrial origin outside the sampled alphaproteobacteria , 2018, Nature.

[225]  Carsten Meyer,et al.  Multidimensional biases, gaps and uncertainties in global plant occurrence information. , 2016, Ecology letters.

[226]  J L Edwards,et al.  Interoperability of biodiversity databases: biodiversity information on every desktop. , 2000, Science.

[227]  Roderic D. M. Page,et al.  Biodiversity informatics: the challenge of linking data and the role of shared identifiers , 2008, Briefings Bioinform..

[228]  F. Chapin,et al.  Consequences of changing biodiversity , 2000, Nature.

[229]  Jason Duell,et al.  Productivity and performance using partitioned global address space languages , 2007, PASCO '07.

[230]  Alex Hardisty,et al.  UvA-DARE ( Digital Academic Repository ) A decadal view of biodiversity informatics : challenges and priorities , 2013 .

[231]  Clara Baringo Fonseca,et al.  SiBBr: Envisioning the spatial distribution of Brazilian biodiversity records , 2017 .

[232]  David E. Golan,et al.  Protein therapeutics: a summary and pharmacological classification , 2008, Nature Reviews Drug Discovery.

[233]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[234]  Cristina Boeres,et al.  EasyGrid: towards a framework for the automatic Grid enabling of legacy MPI applications , 2004, Concurr. Pract. Exp..

[235]  Siang Thye Hang,et al.  Plant Identification: Experts vs. Machines in the Era of Deep Learning - Deep Learning Techniques Challenge Flora Experts , 2018, Multimedia Tools and Applications for Environmental & Biodiversity Informatics.

[236]  Tony X. Han,et al.  Deep convolutional neural network based species recognition for wild animal monitoring , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[237]  Paul T. Groth,et al.  The rationale of PROV , 2015, J. Web Semant..

[238]  Anne Fahy,et al.  Why museum computer projects fail , 2005 .

[239]  Reynold Xin,et al.  Apache Spark , 2016 .

[240]  Andreas Wilke,et al.  The MG-RAST metagenomics database and portal in 2015 , 2015, Nucleic Acids Res..

[241]  Robert P. Anderson,et al.  Environmental filters reduce the effects of sampling bias and improve predictions of ecological niche models , 2014 .

[242]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[243]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[244]  Leonid Oliker,et al.  HipMer: an extreme-scale de novo genome assembler , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[245]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[246]  Paul T. Groth,et al.  Provenance-based validation of e-science experiments , 2005, J. Web Semant..

[247]  Renzo Kottmann,et al.  RCN4GSC Workshop Report: Managing Data at the Interface of Biodiversity and (Meta)Genomics, March 2011 , 2012, Standards in genomic sciences.

[248]  Matthew B Jones,et al.  Ecoinformatics: supporting ecology as a data-intensive science. , 2012, Trends in ecology & evolution.

[249]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[250]  Nicholas Chrisman,et al.  Part 2: Issues and Problems Relating to Cartographic Data Use, Exchange and Transfer: The Role Of Quality Information In The Long-Term Functioning Of A Geographic Information System , 1984 .

[251]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[252]  R. Lourenço-de-Oliveira,et al.  Potential risk of re-emergence of urban transmission of Yellow Fever virus in Brazil facilitated by competent Aedes populations , 2017, Scientific Reports.

[253]  A. Peterson Uses and requirements of ecological niche models and related distributional models , 2006 .

[254]  Steven J. Phillips,et al.  WHAT MATTERS FOR PREDICTING THE OCCURRENCES OF TREES: TECHNIQUES, DATA, OR SPECIES' CHARACTERISTICS? , 2007 .

[255]  José Laurindo Campos dos Santos,et al.  Biodiversity and Integrated Environmental Monitoring , 2013 .

[256]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[257]  Jano I. van Hemert,et al.  Scientific Workflows , 2016, ACM Comput. Surv..

[258]  Louise McRae,et al.  Global biodiversity monitoring: From data sources to Essential Biodiversity Variables , 2017 .

[259]  Luiz M. R. Gadelha,et al.  Model-R: A Framework for Scalable and Reproducible Ecological Niche Modeling , 2017, CARLA.

[260]  Kristin Vanderbilt,et al.  Long term ecological research and information management , 2011, Ecol. Informatics.

[261]  Antonio Mauro Saraiva,et al.  A conceptual framework for quality assessment and management of biodiversity data , 2017, PloS one.

[262]  Juan Carlos Castilla-Rubio,et al.  Earth BioGenome Project: Sequencing life for the future of life , 2018, Proceedings of the National Academy of Sciences.

[263]  Barry Smith,et al.  Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies , 2014, PloS one.

[264]  Theodoros Rekatsinas,et al.  Data Integration and Machine Learning: A Natural Synergy , 2018, Proc. VLDB Endow..

[265]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.