Linking DNA Metabarcoding and Text Mining to Create Network-Based Biomonitoring Tools: A Case Study on Boreal Wetland Macroinvertebrate Communities

Abstract Ecological networks are powerful tools for visualizing biodiversity data and assessing ecosystem health and function. Constructing these networks requires considerable empirical efforts, and this remains highly challenging due to sampling limitations and the laborious and notoriously limited, error-prone process of traditional taxonomic identification. Recent advancements in high-throughput gene sequencing and high-performance computing provide new ways to address these challenges. DNA metabarcoding, a method of bulk taxonomic identification from DNA extracted from environmental samples, can generate detailed biodiversity information through a standardizable analytical pipeline for species detection. When this biodiversity information is annotated with prior knowledge on taxon interactions, body size, and trophic position, it is possible to generate trait-based networks, which we call “heuristic food webs”. Although curating trait matrices for constructing heuristic food webs is a laborious, often intractable process using manual literature surveys, it can be greatly accelerated via text mining, allowing knowledge of relevant traits to be gathered across large databases. To explore this possibility, we employed a General Architecture for Text Engineering (GATE) system to create a hybrid text-mining pipeline combining rule-based and machine-learning modules. This pipeline was then used to query online repositories of published papers for missing data on a key trait, body size, that could not be gathered from existing trophic link libraries of freshwater benthic macroinvertebrates. Combining text-mined body size information with feeding information from existing sources allowed us to generate a database of over 20,000 pairwise trophic interactions. Next, we developed a pipeline that uses taxa lists generated from DNA metabarcoding and annotates this matrix with trophic information from existing databases and text-mined body size data. In this way, we generated heuristic food webs for wetland sites within a large delta complex formed by the confluence of the Peace and Athabasca rivers in northern Alberta: the Peace–Athabasca delta. Finally, we used these putative food webs and their network properties to resolve spatial and temporal differences between the benthic subwebs of wetlands in the Peace and Athabasca sectors of the delta complex. Specifically, we asked two questions. (1) How do food web properties (e.g. number of links, linkage density, trophic height) differ between the wetlands of the Peace and Athabasca deltas? (2) How do food web properties change temporally in wetlands of the two deltas? We discuss using DNA-generated, trait-based food webs as a powerful tool for rapid bioassessment, assess the limitations of our current approach, and outline a path forward to make this powerful tool more widely available for land managers and conservation biologists.

[1]  Frédéric Rimet,et al.  Freshwater biomonitoring in the Information Age , 2017 .

[2]  Camilo Mora,et al.  Comment on “Can We Name Earth’s Species Before They Go Extinct?” , 2013, Science.

[3]  Mehrdad Hajibabaei,et al.  Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next‐generation DNA sequencing , 2012, Molecular ecology.

[4]  Daniel Chessel,et al.  A fuzzy coding approach for the analysis of long‐term ecological data , 1994 .

[5]  Sean C. Anderson,et al.  The paradox of inverted biomass pyramids in kelp forest fish communities , 2016, Proceedings of the Royal Society B: Biological Sciences.

[6]  Teja Tscharntke,et al.  Habitat modification alters the structure of tropical host–parasitoid food webs , 2007, Nature.

[7]  Lawrence N. Hudson,et al.  Joining the dots: An automated method for constructing food webs from compendia of published interactions , 2015 .

[8]  Michio Kondoh,et al.  Quantitative monitoring of multispecies fish environmental DNA using high-throughput sequencing , 2017, bioRxiv.

[9]  Dominique Gravel,et al.  When is an ecological network complex? Connectance drives degree distribution and emerging network properties , 2014, PeerJ.

[10]  M. Vilà,et al.  Ecological impacts of invasive alien plants: a meta-analysis of their effects on species, communities and ecosystems. , 2011, Ecology letters.

[11]  Adrian Covaci,et al.  Surviving in a toxic world: transcriptomics and gene expression profiling in response to environmental pollution in the critically endangered European eel , 2012, BMC Genomics.

[12]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[13]  J. Zedler,et al.  Wetland resources : Status, trends, ecosystem services, and restorability , 2005 .

[14]  Brian Randell,et al.  Fundamental Concepts of Computer System Dependability , 2001 .

[15]  Dick de Zwart,et al.  Toward a knowledge infrastructure for traits‐based ecological risk assessment , 2011, Integrated environmental assessment and management.

[16]  Neil Rooney,et al.  Integrating food web diversity, structure and stability. , 2012, Trends in ecology & evolution.

[17]  Howard Weiss,et al.  Author's Personal Copy Ecological Modelling Modeling Inverted Biomass Pyramids and Refuges in Ecosystems , 2022 .

[18]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[19]  Colin R. Townsend,et al.  Energy availability, spatial heterogeneity and ecosystem size predict food-web structure in streams , 2005 .

[20]  Chris Anderson Data Deluge: Researchers Turn to Cloud Computing as Genomic Sequencing Data Threatens to Overwhelm Traditional IT Systems , 2017 .

[21]  Neo D. Martinez,et al.  Food webs: reconciling the structure and function of biodiversity. , 2012, Trends in ecology & evolution.

[22]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[23]  S. Carpenter,et al.  Food Webs, Body Size, and Species Abundance in Ecological Community Description , 2005 .

[24]  Michel Dumontier,et al.  Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics , 2011, BMC Bioinformatics.

[25]  Wilfried Thuiller,et al.  Comparing species interaction networks along environmental gradients , 2017, Biological reviews of the Cambridge Philosophical Society.

[26]  Colin R. Townsend,et al.  The Effect of Seasonal Variation on the Community Structure and Food-Web Attributes of Two Streams: Implications for Food-Web Science , 1999 .

[27]  Jon Norberg,et al.  Predicting climate change effects on wetland ecosystem services using species distribution modeling and plant functional traits , 2015, AMBIO.

[28]  Kathleen V. Diegert,et al.  Error and uncertainty in modeling and simulation , 2002, Reliab. Eng. Syst. Saf..

[29]  Dominique Gravel,et al.  Inferring food web structure from predator–prey body size relationships , 2013 .

[30]  T. Fukami,et al.  Linking community and ecosystem dynamics through spatial ecology. , 2011, Ecology letters.

[31]  Ben G. Weinstein,et al.  On comparing traits and abundance for predicting species interactions with imperfect detection , 2017 .

[32]  Owen L. Petchey,et al.  Biodiversity and Resilience of Ecosystem Functions. , 2015, Trends in ecology & evolution.

[33]  Guy Woodward,et al.  Drought rewires the cores of food webs , 2016 .

[34]  Philip H. Warren,et al.  Spatial and temporal variation in the structure of a freshwater food web , 1989 .

[35]  P. Jordano Chasing Ecological Interactions , 2016, PLoS biology.

[36]  Alireza Tamaddoni-Nezhad,et al.  Next-Generation Global Biomonitoring: Large-scale, Automated Reconstruction of Ecological Networks. , 2017, Trends in ecology & evolution.

[37]  Matthew M. Hindle,et al.  Ecotoxicology Data Federation with SADI Semantic Web Services , 2012, SWAT4LS.

[38]  Michel Dumontier,et al.  The SADI Personal Health Lens: A Web Browser-Based System for Identifying Personally Relevant Drug Interactions , 2013, JMIR research protocols.

[39]  Lawrence N. Hudson,et al.  Cheddar: analysis and visualisation of ecological communities in R , 2013 .

[40]  Mehrdad Hajibabaei,et al.  Design and applicability of DNA arrays and DNA barcodes in biodiversity monitoring , 2007, BMC Biology.

[41]  N. Davidson How much wetland has the world lost? Long-term and recent trends in global wetland area , 2014 .

[42]  Donald J Baird,et al.  Trait-Based Ecological Risk Assessment (TERA): The New Frontier , 2008, Integrated environmental assessment and management.

[43]  W. Alkema,et al.  Application of text mining in the biomedical domain. , 2015, Methods.

[44]  Philip Spencer Lake,et al.  Freshwater biodiversity and climate change , 2010 .

[45]  Sungyoung Kim,et al.  Central Object Extraction for Object-Based Retrieval , 2003, CIVR.

[46]  Mark J Costello,et al.  Response to Comments on “Can We Name Earth’s Species Before They Go Extinct?” , 2013, Science.

[47]  Y. Paillet,et al.  Biodiversity Differences between Managed and Unmanaged Forests: Meta‐Analysis of Species Richness in Europe , 2010, Conservation biology : the journal of the Society for Conservation Biology.

[48]  Carlos J. Melián,et al.  Eco-evolutionary Dynamics of Individual-Based Food Webs , 2011 .

[49]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[50]  S. Tringe,et al.  Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill , 2012, The ISME Journal.

[51]  Donald J. Baird,et al.  The tiny mayfly in the room: implications of size-dependent invertebrate taxonomic identification for biomonitoring data properties , 2013, Aquatic Ecology.

[52]  Derek Karssenberg,et al.  Dynamic environmental modelling in GIS: 2. Modelling error propagation , 2005, Int. J. Geogr. Inf. Sci..

[53]  Ray Smith An Overview of the Tesseract OCR Engine , 2007 .

[54]  Nicholas K Dulvy,et al.  Ecosystem ecology: size-based constraints on the pyramids of life. , 2013, Trends in ecology & evolution.

[55]  P. Mchugh,et al.  Dual influences of ecosystem size and disturbance on food chain length in streams. , 2010, Ecology letters.

[56]  Dominique Gravel,et al.  Analysing ecological networks of species interactions , 2018, Biological reviews of the Cambridge Philosophical Society.

[57]  René Witte,et al.  OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents , 2011, Bioinform..

[58]  K. Winemiller Spatial and Temporal Variation in Tropical Fish Trophic Networks , 1990 .

[59]  P. Hebert,et al.  DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. , 2007, Trends in genetics : TIG.

[60]  Zhiqiang Hu,et al.  Toward the development of microbial indicators for wetland assessment. , 2013, Water research.

[61]  S. Carpenter,et al.  Catastrophic regime shifts in ecosystems: linking theory to observation , 2003 .

[62]  R. Naiman,et al.  Freshwater biodiversity: importance, threats, status and conservation challenges , 2006, Biological reviews of the Cambridge Philosophical Society.

[63]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Angus R. McIntosh,et al.  Are impacts of an exotic predator on a stream food web influenced by disturbance history? , 2003, Oecologia.

[65]  Chris Mungall,et al.  The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation , 2016, Journal of Biomedical Semantics.

[66]  Kevin L. Erwin Wetlands and global climate change: the role of wetland restoration in a changing world , 2009, Wetlands Ecology and Management.

[67]  Shuqing An,et al.  Current state of knowledge regarding the world’s wetlands and their future under global climate change: a synthesis , 2012, Aquatic Sciences.

[68]  Aibin Zhan,et al.  High sensitivity of 454 pyrosequencing for detection of rare species in aquatic communities , 2013 .

[69]  Thomas W. Schoener,et al.  Food Webs From the Small to the Large: The Robert H. MacArthur Award Lecture , 1989 .

[70]  K. A. Parton,et al.  The status of wetlands and the predicted effects of global climate change: the situation in Australia , 2011, Aquatic Sciences.

[71]  Matthew M. Hindle,et al.  Benchmarking infrastructure for mutation text mining , 2014, AIMM.

[72]  Luc De Meester,et al.  Bottom-Up Effects on Biomass Versus Top-Down Effects on Identity: A Multiple-Lake Fish Community Manipulation Experiment , 2017, Ecosystems.

[73]  A. Chariton,et al.  DNA Metabarcoding Meets Experimental Ecotoxicology: Advancing Knowledge on the Ecological Effects of Copper in Freshwater Ecosystems , 2014 .

[74]  Nico Eisenhauer,et al.  Shifts of community composition and population density substantially affect ecosystem function despite invariant richness. , 2017, Ecology letters.

[75]  G. Daily,et al.  Modeling multiple ecosystem services, biodiversity conservation, commodity production, and tradeoffs at landscape scales , 2009 .

[76]  B. Worm,et al.  META-ANALYSIS OF COD-SHRIMP INTERACTIONS REVEALS TOP-DOWN CONTROL IN OCEANIC FOOD WEBS , 2003 .

[77]  Mehrdad Hajibabaei,et al.  Identifying North American freshwater invertebrates using DNA barcodes: are existing COI sequence libraries fit for purpose? , 2018, Freshwater Science.

[78]  K. R. Clarke,et al.  Comparing the severity of disturbance: a metaanalysis of marine macrobenthic community data , 1993 .

[79]  Simon Jennings,et al.  Application of nitrogen stable isotope analysis in size-based marine food web and macroecological research. , 2008, Rapid communications in mass spectrometry : RCM.

[80]  Ariane L. Peralta,et al.  Microbial Community Structure and Denitrification in a Wetland Mitigation Bank , 2010, Applied and Environmental Microbiology.

[81]  Philippe Desjardins-Proulx,et al.  Ecological interactions and the Netflix problem , 2016, bioRxiv.

[82]  Giulio A. De Leo,et al.  A critical review of representative wetland rapid assessment methods in North America , 2004 .

[83]  W. Steffen,et al.  The trajectory of the Anthropocene: The Great Acceleration , 2015 .

[84]  Anne E. Trefethen,et al.  The Data Deluge: An e-Science Perspective , 2003 .

[85]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  Evan Harrison,et al.  Analyzing cause and effect in environmental assessments: using weighted evidence from the literature , 2011, Freshwater Science.

[87]  Dominique Gravel,et al.  Synthetic datasets and community tools for the rapid testing of ecological hypotheses , 2015, bioRxiv.

[88]  Imran Shah,et al.  Toxicology ontology perspectives. , 2012, ALTEX.

[89]  A. Arthington,et al.  Basic Principles and Ecological Consequences of Altered Flow Regimes for Aquatic Biodiversity , 2002, Environmental management.

[90]  Ulrich J. Frey,et al.  Building a diagnostic ontology of social-ecological systems , 2015 .

[91]  Wilfried Thuiller,et al.  Cross-scale integration of knowledge for predicting species ranges: a metamodeling framework. , 2016, Global ecology and biogeography : a journal of macroecology.

[92]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[93]  D. Baird,et al.  Biomonitoring for the 21st Century: new perspectives in an age of globalisation and emerging environmental threats , 2013 .

[94]  B. Fry Stable Isotope Ecology , 2006 .

[95]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Stuart L. Simpson,et al.  Faster, Higher and Stronger? The Pros and Cons of Molecular Faunal Data for Assessing Ecosystem Condition , 2014 .

[97]  Robert Leconte,et al.  Flood hydrology of the Peace‐Athabasca Delta, northern Canada , 2006 .

[98]  R. Dirzo,et al.  Defaunation in the Anthropocene , 2014, Science.

[99]  William H. McDowell,et al.  Can uptake length in streams be determined by nutrient addition experiments? Results from an interbiome comparison study , 2002, Journal of the North American Benthological Society.

[100]  Roman Ashauer,et al.  Framework for traits‐based assessment in ecotoxicology , 2011, Integrated environmental assessment and management.

[101]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[102]  Danilo Ercolini,et al.  High-Throughput Sequencing and Metagenomics: Moving Forward in the Culture-Independent Analysis of Food Microbial Ecology , 2013, Applied and Environmental Microbiology.

[103]  Dominique Gravel,et al.  A quantitative framework for investigating the reliability of network construction , 2018, bioRxiv.

[104]  Raymond Smith,et al.  Adapting the Tesseract open source OCR engine for multilingual OCR , 2009, MOCR '09.

[105]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[106]  Guy Woodward,et al.  Biomonitoring of Human Impacts in Freshwater Ecosystems: The Good, the Bad and the Ugly , 2011 .

[107]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[108]  Holly M. Bik,et al.  Sequencing our way towards understanding global eukaryotic biodiversity. , 2012, Trends in ecology & evolution.

[109]  Florian D. Schneider,et al.  Predicting the consequences of species loss using size‐structured biodiversity approaches , 2017, Biological reviews of the Cambridge Philosophical Society.

[110]  Sylvain Dolédec,et al.  Species traits for future biomonitoring across ecoregions: patterns along a human-impacted river , 1999 .

[111]  T. Romanuk,et al.  A meta-analysis of declines in local species richness from human disturbances , 2013, Ecology and evolution.

[112]  Jordán Pascual Espada,et al.  Machine learning approach for text and document mining , 2014, ArXiv.

[113]  W. Eaton,et al.  The impact of Pentaclethra macroloba on soil microbial nitrogen fixing communities and nutrients within developing secondary forests in the Northern Zone of Costa Rica , 2012 .

[114]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[115]  Tim Urich,et al.  Rare but active taxa contribute to community dynamics of benthic biofilms in glacier-fed streams. , 2014, Environmental microbiology.

[116]  Dominique Gravel,et al.  Species traits as drivers of food web structure , 2018 .

[117]  René Witte,et al.  Towards a Systematic Evaluation of protein Mutation Extraction Systems , 2007, J. Bioinform. Comput. Biol..

[118]  Edward B. Barbier,et al.  Wetlands as natural assets , 2011 .

[119]  Mehrdad Hajibabaei,et al.  Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing , 2015, PloS one.

[120]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[121]  Robert D Holt,et al.  A framework for community interactions under climate change. , 2010, Trends in ecology & evolution.

[122]  Peter J. Haas,et al.  Automated hypothesis generation based on mining scientific literature , 2014, KDD.

[123]  Stephen R. Carpenter,et al.  Ecological community description using the food web, species abundance, and body size , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[124]  Michael J. Stewardson,et al.  Eco Evidence: using the scientific literature to inform evidence-based decision making in environmental management , 2011 .

[125]  R. May,et al.  Can We Name Earth's Species Before They Go Extinct? , 2013, Science.

[126]  Clare Gray,et al.  FORUM: Ecological networks: the missing links in biomonitoring science , 2014, The Journal of applied ecology.

[127]  Anand Kumar,et al.  Text mining and ontologies in biomedicine: Making sense of raw text , 2005, Briefings Bioinform..

[128]  Federico Giri,et al.  Quantitative food webs and invertebrate assemblages of a large River: a spatiotemporal approach in floodplain shallow lakes , 2017 .

[129]  Stephen J. Rossiter,et al.  Chapter Two – Long-Term Dynamics of a Well-Characterised Food Web: Four Decades of Acidification and Recovery in the Broadstone Stream Model System , 2011 .

[130]  Werner Ulrich,et al.  BODY SIZES OF CONSUMERS AND THEIR RESOURCES , 2005 .

[131]  Anirban Dasgupta,et al.  The discoverability of the web , 2007, WWW '07.

[132]  Terry D. Prowse,et al.  Persistence of Water within Perched Basins of the Peace-Athabasca Delta, Northern Canada , 2006, Wetlands Ecology and Management.

[133]  Michael J. Stewardson,et al.  Ecological Responses to Flow Alteration: Assessing Causal Relationships with Eco Evidence , 2011, Wetlands.

[134]  Brett R. Scheffers,et al.  Biodiversity redistribution under climate change: Impacts on ecosystems and human well-being , 2017, Science.

[135]  Francesco Falciani,et al.  Freshwater Conservation and Biomonitoring of Structure and Function: Genes to Ecosystems , 2015 .

[136]  Gerard B. M. Heuvelink,et al.  Error Propagation in Environmental Modelling with GIS , 1998 .

[137]  William D. Eaton,et al.  A comparison of nutrient dynamics and microbial community characteristics across seasons and soil types in two different old growth forests in Costa Rica , 2011 .

[138]  Chris Mungall,et al.  Global biotic interactions: An open infrastructure to share and analyze species-interaction datasets , 2014, Ecol. Informatics.

[139]  Eve McDonald-Madden,et al.  Operationalizing Network Theory for Ecosystem Service Assessments. , 2017, Trends in ecology & evolution.

[140]  Guy Woodward,et al.  Quantification and Resolution of a Complex, Size-Structured Food Web , 2005 .

[141]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .

[142]  Simon Jennings,et al.  Trophic levels of marine consumers from nitrogen stable isotope analysis: estimation and uncertainty , 2015 .

[143]  K. Arras An Introduction To Error Propagation: Derivation, Meaning and Examples of Equation , 1998 .

[144]  Patrick L. Thompson,et al.  Linking the influence and dependence of people on biodiversity across scales , 2017, Nature.

[145]  Shaopeng Wang,et al.  Biodiversity and ecosystem stability across scales in metacommunities. , 2016, Ecology letters.

[146]  Dominique Gravel,et al.  From projected species distribution to food‐web structure under climate change , 2014, Global change biology.

[147]  M. Leibold,et al.  Stability and complexity in model meta-ecosystems , 2016, Nature Communications.

[148]  G. Closs,et al.  Spatial and Temporal Variation in the Structure of an Intermittent-Stream Food Web , 1994 .

[149]  Shawn Bowers,et al.  An ontology for describing and synthesizing ecological observation data , 2007, Ecol. Informatics.

[150]  Euan G Ritchie,et al.  Top predators constrain mesopredator distributions , 2017, Nature Communications.

[151]  Neo D. Martinez,et al.  Simple rules yield complex food webs , 2000, Nature.

[152]  J. Bascompte,et al.  Ecological networks : beyond food webs Ecological networks – beyond food webs , 2008 .

[153]  Stefano Allesina,et al.  The dimensionality of ecological networks. , 2013, Ecology letters.

[154]  Bruce Hannon,et al.  Ecological network analysis : network construction , 2007 .

[155]  Alireza Tamaddoni-Nezhad,et al.  Learning ecological networks from next-generation sequencing data , 2016 .

[156]  Ulrich Brose,et al.  Biodiversity and ecosystem functioning in dynamic landscapes , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.