Automated Discovery of Relationships, Models, and Principles in Ecology

Ecological systems are the quintessential complex systems, involving numerous high-order interactions and non-linear relationships. The most used statistical modeling techniques can hardly accommodate the complexity of ecological patterns and processes. Finding hidden relationships in complex data is now possible using massive computational power, particularly by means of artificial intelligence and machine learning methods. Here we explored the potential of symbolic regression (SR), commonly used in other areas, in the field of ecology. Symbolic regression searches for both the formal structure of equations and the fitting parameters simultaneously, hence providing the required flexibility to characterize complex ecological systems. Although the method here presented is automated, it is part of a collaborative human–machine effort and we demonstrate ways to do it. First, we test the robustness of SR to extreme levels of noise when searching for the species-area relationship. Second, we demonstrate how SR can model species richness and spatial distributions. Third, we illustrate how SR can be used to find general models in ecology, namely new formulas for species richness estimators and the general dynamic model of oceanic island biogeography. We propose that evolving free-form equations purely from data, often without prior human inference or hypotheses, may represent a very powerful tool for ecologists and biogeographers to become aware of hidden relationships and suggest general theoretical models and principles.

[1]  Ciro Donalek,et al.  Machine-assisted discovery of relationships in astronomy , 2013, 1302.5129.

[2]  Luís Correia,et al.  Computational evolution: taking liberties , 2010, Theory in Biosciences.

[3]  T. Poisot,et al.  Artificial Intelligence and Synthesis in Ecology and Evolution , 2019 .

[4]  Andrew M. Hein,et al.  Reverse-engineering ecological theory from data , 2018, Proceedings of the Royal Society B: Biological Sciences.

[5]  Peter E. Larsen,et al.  Predicting bacterial community assemblages using an artificial neural network approach , 2012, Nature Methods.

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  P. Cardoso Standardization and optimization of arthropod inventories—the case of Iberian spiders , 2009, Biodiversity and Conservation.

[8]  Peter E. Larsen,et al.  Predicting bacterial community assemblages using an artificial neural network approach. , 2012, Methods in molecular biology.

[9]  A. J. Lotka,et al.  Elements of Physical Biology. , 1925, Nature.

[10]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[11]  J. Lobo,et al.  A spatial scale assessment of habitat effects on arthropod communities of an oceanic island , 2009 .

[12]  Mark Kotanchek,et al.  Pareto-Front Exploitation in Symbolic Regression , 2005 .

[13]  W. Dodds Laws, Theories, and Patterns in Ecology , 2009 .

[14]  R. Whittaker,et al.  The island species–area relationship: biology and statistics , 2012 .

[15]  N. Fortin,et al.  Characterizing and predicting cyanobacterial blooms in an 8-year amplicon sequencing time-course , 2017, bioRxiv.

[16]  Martino Bertoni,et al.  A non-deterministic approach to forecasting the trophic evolution of lakes , 2016 .

[17]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[18]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[19]  Tsung-Yu Lee,et al.  Application of genetic programming to project climate change impacts on the population of Formosan Landlocked Salmon , 2009, Environ. Model. Softw..

[20]  S. Fattorini On the general dynamic model of oceanic island biogeography , 2009 .

[21]  John R. Koza,et al.  Human-competitive results produced by genetic programming , 2010, Genetic Programming and Evolvable Machines.

[22]  Maureen A. O’Malley,et al.  Do simple models lead to generality in ecology? , 2013, Trends in ecology & evolution.

[24]  Peter E. Larsen,et al.  Modeling forest ecosystem responses to elevated carbon dioxide and ozone using artificial neural networks. , 2014, Journal of theoretical biology.

[25]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[26]  John R. Koza,et al.  Hidden Order: How Adaptation Builds Complexity. , 1995, Artificial Life.

[27]  R. Gabriel,et al.  Bryophyte community composition and habitat specificity in the natural forests of Terceira, Azores , 2005, Plant Ecology.

[28]  John H. Holland,et al.  Emergence. , 1997, Philosophica.

[29]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[30]  H. Akaike A new look at the statistical model identification , 1974 .

[31]  Steven M. Manson,et al.  Agent-based modeling and genetic programming for modeling land change in the Southern Yucatán Peninsular Region of Mexico , 2005 .

[32]  Jorge SoberónM.,et al.  The Use of Species Accumulation Functions for the Prediction of Species Richness , 1993 .

[33]  Nitin Muttil,et al.  Genetic programming for analysis and real-time prediction of coastal algal blooms , 2005 .

[34]  P. Verhulst Recherches mathématiques sur la loi d’accroissement de la population , 2022, Nouveaux mémoires de l'Académie royale des sciences et belles-lettres de Bruxelles.

[35]  Kevin J Gaston,et al.  Estimating Species Abundance from Occurrence , 2000, The American Naturalist.

[36]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[37]  K. Chau,et al.  Neural network and genetic programming for modelling coastal algal blooms , 2006 .

[38]  Chun Chen,et al.  Challenges and opportunities: from big data to knowledge in AI 2.0 , 2017, Frontiers of Information Technology & Electronic Engineering.

[39]  J. Frouz,et al.  Rate-specific responses of prokaryotic diversity and structure to nitrogen deposition in the Leymus chinensis steppe , 2014 .

[40]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[41]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[42]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[43]  Omri Allouche,et al.  Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS) , 2006 .

[44]  Madhur Anand,et al.  Ecological Systems as Complex Systems: Challenges for an Emerging Science , 2010 .

[45]  C. Gaspar,et al.  Assessing spider species richness and composition in Mediterranean cork oak forests , 2008 .

[46]  Eduardo Brito de Azevedo,et al.  Geographical, Temporal and Environmental Determinants of Bryophyte Species Richness in the Macaronesian Islands , 2014, PloS one.

[47]  J. Lawton Patterns in Ecology , 1996 .

[48]  R. Wiegert,et al.  Documenting Cmpleteness, Species-Area Relations, and the Species-Abundance Distribution of a Regional Flora , 1989 .

[49]  R. Whittaker,et al.  ORIGINAL ARTICLE: A general dynamic theory of oceanic island biogeography , 2008 .

[50]  D. C. Englebart,et al.  Augmenting human intellect: a conceptual framework , 1962 .

[51]  Yang-Yu Liu,et al.  REVEALING COMPLEX ECOLOGICAL DYNAMICS VIA SYMBOLIC REGRESSION , 2016, bioRxiv.

[52]  T. Dawson,et al.  Selecting thresholds of occurrence in the prediction of species distributions , 2005 .

[53]  S. Passy A hierarchical theory of macroecology. , 2012, Ecology letters.

[54]  Yang Lu,et al.  Artificial intelligence: a survey on evolution, models, applications and future trends , 2019, Journal of Management Analytics.

[55]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[56]  J. Lawton,et al.  Interspecific abundance-range size relationships: An appraisal of mechanisms , 1997 .

[57]  Eric R. Ziegel,et al.  Handbook of Nonlinear Regression Models , 1991 .

[58]  Philippe Desjardins-Proulx,et al.  Artificial Intelligence for Ecological and Evolutionary Synthesis , 2019, Front. Ecol. Evol..

[59]  Renáta Dubcáková,et al.  Eureqa: software review , 2011, Genetic Programming and Evolvable Machines.

[60]  N. Fortin,et al.  Characterising and predicting cyanobacterial blooms in an 8-year amplicon sequencing time course , 2017, The ISME Journal.

[61]  Melanie Mitchell,et al.  Complexity - A Guided Tour , 2009 .

[62]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[63]  A. Chao Estimating the population size for capture-recapture data with unequal catchability. , 1987, Biometrics.

[64]  Tom Evans,et al.  Agent-based modeling of deforestation in southern Yucatán, Mexico, and reforestation in the Midwest United States , 2007, Proceedings of the National Academy of Sciences.

[65]  P. Cardoso,et al.  Integrating Landscape Disturbance and Indicator Species in Conservation Studies , 2013, PloS one.

[66]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .

[67]  A. J. Lotka Elements of Physical Biology. , 1925, Nature.

[68]  Robert P. Anderson,et al.  Standards for distribution models in biodiversity assessments , 2019, Science Advances.

[69]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[70]  B. Reineking,et al.  Re-evaluating the general dynamic theory of oceanic island biogeography , 2013 .

[71]  F. W. Preston The Commonness, And Rarity, of Species , 1948 .

[72]  J. Tait,et al.  Challenges and opportunities. , 1996, Journal of psychiatric and mental health nursing.

[73]  Jean Scholtz,et al.  Beyond Usability Evaluation: Analysis of Human-Robot Interaction at a Major Robotics Competition , 2004, Hum. Comput. Interact..

[74]  Nayat Sánchez Pi,et al.  Collective preferences in evolutionary multi-objective optimization: techniques and potential contributions of collective intelligence , 2015, SAC.

[75]  James H. Brown On the Relationship between Abundance and Distribution of Species , 1984, The American Naturalist.

[76]  Jurandy Almeida,et al.  Deriving vegetation indices for phenology analysis using genetic programming , 2015, Ecol. Informatics.

[77]  A. J. Lotka Contribution to the Theory of Periodic Reactions , 1909 .

[78]  B. Goodwin,et al.  Signs Of Life: How Complexity Pervades Biology , 2000 .

[79]  G. Nachman,et al.  A MATHEMATICAL MODEL OF THE FUNCTIONAL RELATIONSHIP BETWEEN DENSITY AND SPATIAL DISTRIBUTION OF A POPULATION , 1981 .

[80]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[81]  R. Macarthur,et al.  The Theory of Island Biogeography , 1969 .

[82]  George Sugihara,et al.  Detecting Causality in Complex Ecosystems , 2012, Science.

[83]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[84]  Ulrich Eggers,et al.  Emergence From Chaos To Order , 2016 .

[85]  Jay M. Savage,et al.  Zoogeography: The Geographical Distribution of Animals , 1958 .

[86]  Robert K. Colwell,et al.  Estimating terrestrial biodiversity through extrapolation. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[87]  Pedro Cardoso,et al.  Drivers of diversity in Macaronesian spiders and the role of species extinctions , 2010 .

[88]  J. B. Schmidt,et al.  Rapid biodiversity assessment of spiders (Araneae) using semi‐quantitative sampling: a case study in a Mediterranean forest , 2008 .