Fine scale prediction of ecological community composition using a two-step sequential Machine Learning ensemble

Prediction is one of the last frontiers in ecology. Indeed, predicting fine-scale species composition in natural systems is a complex challenge as multiple abiotic and biotic processes operate simultaneously to determine local species abundances. On the one hand, species intrinsic performance and their tolerance limits to different abiotic pressures modulate species abundances. On the other hand, there is growing recognition that species interactions play an equally important role in limiting or promoting such abundances within ecological communities. Here, we present a joint effort between ecologists and data scientists to use data-driven models to predict species abundances using reasonably easy to obtain data. We propose a sequential data-driven modeling approach that in a first step predicts the potential species abundances based on abiotic variables, and in a second step uses these predictions to model the realized abundances once accounting for species competition. Using a curated data set over five years we predict fine-scale species abundances in a highly diverse annual plant community. Our models show a remarkable spatial predictive accuracy using only easy-to-measure variables in the field, yet such predictive power is lost when temporal dynamics are taken into account. This result suggests that predicting future abundances requires longer time series analysis to capture enough variability. In addition, we show that these data-driven models can also suggest how to improve mechanistic models by adding missing variables that affect species performance such as particular soil conditions (e.g. carbonate availability in our case). Robust models for predicting fine-scale species composition informed by the mechanistic understanding of the underlying abiotic and biotic processes can be a pivotal tool for conservation, especially given the human-induced rapid environmental changes we are experiencing. This objective can be achieved by promoting the knowledge gained with classic modelling approaches in ecology and recently developed data-driven models.

[1]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  R. Bardgett,et al.  Legacy effects of drought on plant-soil feedbacks and plant-plant interactions. , 2017, The New phytologist.

[3]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[4]  Leonardo Uieda Verde: Processing and gridding spatial data using Green's functions , 2018, J. Open Source Softw..

[5]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[6]  Serguei Saavedra,et al.  A structural approach for understanding multispecies coexistence , 2017 .

[7]  Rudolf P. Rohr,et al.  Towards the Integration of Niche and Network Theories. , 2018, Trends in ecology & evolution.

[8]  Daniel B. Stouffer,et al.  Higher-order interactions capture unexplained complexity in diverse communities , 2017, Nature Ecology &Evolution.

[9]  Brian J. McGill,et al.  The priority of prediction in ecological understanding , 2017 .

[10]  J. Grilli Macroecological laws describe variation and diversity in microbial communities , 2020, Nature Communications.

[11]  Jonathon J. Valente,et al.  When are hypotheses useful in ecology and evolution? , 2021, Ecology and evolution.

[12]  I. Bartomeus,et al.  cxr: A toolbox for modelling species coexistence in R , 2020, Methods in Ecology and Evolution.

[13]  W. Trivelpiece,et al.  Rethinking "normal": The role of stochasticity in the phenology of a synchronously breeding seabird. , 2018, The Journal of animal ecology.

[14]  P. Chesson Mechanisms of Maintenance of Species Diversity , 2000 .

[15]  Stefano Allesina,et al.  A competitive network theory of species diversity , 2011, Proceedings of the National Academy of Sciences.

[16]  M. Gregory,et al.  Mapping gradients of community composition with nearest-neighbour imputation: extending plot data for landscape analysis , 2011 .

[17]  Eve-Lyn S. Hinckley,et al.  NEON terrestrial field observations: designing continental scale, standardized sampling , 2012 .

[18]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[19]  Rudolf P. Rohr,et al.  Experimental evidence of the importance of multitrophic structure for species persistence , 2021, Proceedings of the National Academy of Sciences.

[20]  L. R. Taylor,et al.  Aggregation, Variance and the Mean , 1961, Nature.

[21]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[22]  Catherine H. Graham,et al.  A comparison of methods for mapping species ranges and species richness , 2006 .

[23]  Stefano Allesina,et al.  Predicting coexistence in experimental ecological communities , 2019, Nature Ecology & Evolution.

[24]  A. Maritan,et al.  Sample and population exponents of generalized Taylor’s law , 2014, Proceedings of the National Academy of Sciences.

[25]  L. Anderegg,et al.  Local range boundaries vs. large-scale trade-offs: climatic and competitive constraints on tree growth. , 2019, Ecology letters.

[26]  Janneke HilleRisLambers,et al.  The importance of niches for the maintenance of species diversity , 2009, Nature.

[27]  J. Hill,et al.  Coupling spectral unmixing and trend analysis for monitoring of long-term vegetation dynamics in Mediterranean rangelands , 2003 .

[28]  Luís Torgo,et al.  SMOTE for Regression , 2013, EPIA.

[29]  Serguei Saavedra,et al.  Structural forecasting of species persistence under changing environments. , 2020, Ecology letters.

[30]  P. Hernandez,et al.  The effect of sample size and species characteristics on performance of different species distribution modeling methods , 2006 .

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  A. Tredennick,et al.  Predicting species abundances in a grassland biodiversity experiment: Trade‐offs between model complexity and generality , 2019, Journal of Ecology.

[33]  Werner Ulrich,et al.  Intransitive competition is widespread in plant communities and maintains their species richness. , 2015, Ecology letters.

[34]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[35]  Stefano Allesina,et al.  Beyond pairwise mechanisms of species coexistence in complex communities , 2017, Nature.

[36]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[37]  Roger Pradel,et al.  Prediction in ecology: promises, obstacles and clarifications , 2018 .

[38]  Mevin B Hooten,et al.  Iterative near-term ecological forecasting: Needs, opportunities, and challenges , 2018, Proceedings of the National Academy of Sciences.

[39]  R. Chisholm,et al.  Mean growth rate when rare is not a reliable metric for persistence of species. , 2019, Ecology letters.

[40]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[41]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[42]  Miguel B Araújo,et al.  The effect of multiple biotic interaction types on species persistence. , 2018, Ecology.

[43]  Owen L. Petchey,et al.  Interaction strengths in food webs: issues and opportunities , 2004 .

[44]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[45]  Johan Ehrlén,et al.  Predicting changes in the distribution and abundance of species under environmental change , 2015, Ecology letters.

[46]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[47]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[48]  J. Monk,et al.  Sensitivity of fine‐scale species distribution models to locational uncertainty in occurrence data across multiple sample sizes , 2017 .

[49]  R. Irizarry ggplot2 , 2019, Introduction to Data Science.

[50]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[51]  Hajar Mousannif,et al.  The application of machine learning techniques for driving behavior analysis: A conceptual framework and a systematic literature review , 2020, Eng. Appl. Artif. Intell..

[52]  Daniel B. Stouffer,et al.  Accurate predictions of coexistence in natural systems require the inclusion of facilitative interactions and environmental dependency , 2018, Journal of Ecology.

[53]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[54]  I. Bartomeus,et al.  Opposing effects of floral visitors and soil conditions on the determinants of competitive outcomes maintain species diversity in heterogeneous landscapes. , 2018, Ecology letters.

[55]  A. Staver,et al.  Prediction and scale in savanna ecosystems. , 2018, The New phytologist.

[56]  Jorge Soberón Grinnellian and Eltonian niches and geographic distributions of species. , 2007, Ecology letters.

[57]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[58]  Andreas Huth,et al.  Integrating the underlying structure of stochasticity into community ecology , 2019, Ecology.

[59]  W. D. Kissling,et al.  The role of biotic interactions in shaping distributions and realised assemblages of species: implications for species distribution modelling , 2012, Biological reviews of the Cambridge Philosophical Society.

[60]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[61]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[62]  Nathan J B Kraft,et al.  Community assembly, coexistence and the environmental filtering metaphor , 2015 .

[63]  R. B. Jackson,et al.  Global biodiversity scenarios for the year 2100. , 2000, Science.

[64]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[65]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[66]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[67]  Claudio Angione,et al.  Machine and deep learning meet genome-scale metabolic modeling , 2019, PLoS Comput. Biol..