Machine Learning Methods Without Tears: A Primer for Ecologists

Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machine learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.

[1]  Russell G. Death,et al.  An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data , 2004 .

[2]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[3]  P. Hogeweg Cellular automata as a paradigm for ecological modeling , 1988 .

[4]  G. David Garson,et al.  Interpreting neural-network connection weights , 1991 .

[5]  Sovan Lek,et al.  Energy availability and habitat heterogeneity predict global riverine fish diversity , 1998, Nature.

[6]  C. Sutton Classification and Regression Trees, Bagging, and Boosting , 2005 .

[7]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[8]  R. O’Connor,et al.  Spatial partitioning of environmental correlates of avian biodiversity in the conterminous United States , 1996 .

[9]  Patricia E. Rosel,et al.  IMPROVING MANAGEMENT OF OVERLAPPING BOTTLENOSE DOLPHIN ECOTYPES THROUGH SPATIAL ANALYSIS AND GENETICS , 2003 .

[10]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Gevrey,et al.  Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[12]  Donald A. Jackson,et al.  Fish–Habitat Relationships in Lakes: Gaining Predictive and Explanatory Insight by Using Artificial Neural Networks , 2001 .

[13]  R. Keane,et al.  MAPPING FUELS AND FIRE REGIMES USING REMOTE SENSING, ECOSYSTEM SIMULATION, AND GRADIENT MODELING , 2004 .

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Julian D. Olden,et al.  A comparison of statistical approaches for modelling fish species distributions , 2002 .

[16]  Craig A. Stow,et al.  Sources of variability in microcontaminant data for Lake Michigan salmonids: statistical models and implications for trend detection , 1999 .

[17]  A. Peterson,et al.  Predicting Species Invasions Using Ecological Niche Modeling: New Approaches from Bioinformatics Attack a Pressing Problem , 2001 .

[18]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[19]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[20]  J. Drake,et al.  Modelling ecological niches with support vector machines , 2006 .

[21]  John Bell,et al.  Tree-based methods , 1999 .

[22]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[23]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[24]  S Lek,et al.  Classifying individuals among infra-specific taxa using microsatellite data and neural networks. , 1996, Comptes rendus de l'Academie des sciences. Serie III, Sciences de la vie.

[25]  Lawrence B. Slobodkin,et al.  A Critique for Ecology , 1991 .

[26]  Judith Bayard Cushing,et al.  Eco-Informatics for Decision Makers Advancing a Research Agenda , 2005, DILS.

[27]  S. Lek,et al.  Environmental impact prediction using neural network modelling. An example in wildlife damage , 1999 .

[28]  James H. Thorne,et al.  PREDICTING OCCURRENCES AND IMPACTS OF SMALLMOUTH BASS INTRODUCTIONS IN NORTH TEMPERATE LAKES , 2004 .

[29]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[30]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[31]  David R. B. Stockwell,et al.  The GARP modelling system: problems and solutions to automated spatial prediction , 1999, Int. J. Geogr. Inf. Sci..

[32]  Friedrich Recknagel,et al.  Ecological Informatics: Understanding Ecology by Biologically-Inspired Computation , 2003 .

[33]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[34]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[35]  Sara J. Iverson,et al.  Fatty acid signatures and classification trees: new tools for investigating the foraging ecology of seals , 1997 .

[36]  A. Prasad,et al.  Using landscape analysis to assess and model tsunami damage in Aceh province, Sumatra , 2007, Landscape Ecology.

[37]  J. Olden A Species‐Specific Approach to Modeling Biological Communities and Its Potential for Conservation , 2003 .

[38]  Uygar Özesmi,et al.  An artificial neural network approach to spatial habitat modelling with interspecific interaction , 1999 .

[39]  A. Peterson Predicting the Geography of Species’ Invasions via Ecological Niche Modeling , 2003, The Quarterly Review of Biology.

[40]  David R. B. Stockwell,et al.  Induction of sets of rules from animal distribution data: a robust and informative method of data analysis , 1992 .

[41]  Nisikawa Usio,et al.  Endangered crayfish in northern Japan : Distribution, abundance and microhabitat specificity in relation to stream and riparian environment , 2007 .

[42]  Sovan Lek,et al.  Abundance, diversity, and structure of freshwater invertebrates and fish communities: An artificial neural network approach , 2001 .

[43]  J. Cohen,et al.  Modeling Biological Systems. Principles and Applications , 1997 .

[44]  S. Carpenter,et al.  Ecological forecasts: an emerging imperative. , 2001, Science.

[45]  W. Thuiller BIOMOD – optimizing predictions of species distributions and projecting potential future shifts under global change , 2003 .

[46]  Friedrich Recknagel,et al.  Applications of machine learning to ecological modelling , 2001 .

[47]  M. J. Hatcher,et al.  Modeling Biological Systems: Principles and Applications , 1997 .

[48]  S. Levin Ecosystems and the Biosphere as Complex Adaptive Systems , 1998, Ecosystems.

[49]  N. LeRoy Poff,et al.  Incorporating ecological knowledge into ecoinformatics: An example of modeling hierarchically structured aquatic communities with neural networks , 2006, Ecol. Informatics.

[50]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[51]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[52]  Alan H. Fielding,et al.  Machine Learning Methods for Ecological Applications , 2012, Springer US.

[53]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[54]  Tae-Soo Chon,et al.  Biologically-inspired machine learning implemented to ecological informatics ☆ , 2007 .

[55]  John F. McCauley,et al.  REGRESSION‐TREE MODELING OF DESERT TORTOISE HABITAT IN THE CENTRAL MOJAVE DESERT , 2000 .

[56]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[57]  J. Olden,et al.  Forecasting the Spread of Invasive Rainbow Smelt in the Laurentian Great Lakes Region of North America , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[58]  M. Termansen,et al.  The use of genetic algorithms and Bayesian classification to model species distributions , 2006 .

[59]  A. Peterson,et al.  PREDICTING SPECIES' GEOGRAPHIC DISTRIBUTIONS BASED ON ECOLOGICAL NICHE MODELING , 2001 .

[60]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[61]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[62]  Julian D Olden,et al.  Rediscovering the species in community-wide predictive modeling. , 2006, Ecological applications : a publication of the Ecological Society of America.

[63]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[64]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[65]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[66]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[67]  John M. Drake,et al.  Forecasting Potential Distributions of Nonindigenous Species with a Genetic Algorithm , 2006 .

[68]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[69]  Antoine Guisan,et al.  Spatial modelling of biodiversity at the community level , 2006 .

[70]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[71]  A. Townsend Peterson,et al.  Why not WhyWhere: The need for more complex models of simpler environmental spaces , 2007 .

[72]  Can Ozan Tan,et al.  Methodological issues in building, training, and testing artificial neural networks in ecological applications , 2005, q-bio/0510017.

[73]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[74]  Tae-Soo Chon,et al.  Application of wavelet analysis to ecological data , 2006, Ecol. Informatics.

[75]  I. Dimopoulos,et al.  Application of neural networks to modelling nonlinear relationships in ecology , 1996 .

[76]  Stanley V. Gregory,et al.  Ecological uses for genetic algorithms: predicting fish distributions in complex physical habitats , 1995 .

[77]  Susan P. Worner,et al.  Modelling global insect pest species assemblages to determine risk of invasion , 2006 .

[78]  Ding-Geng Chen,et al.  A fuzzy logic model with genetic algorithm for analyzing fish stock-recruitment relationships , 2000 .

[79]  Nitin Muttil,et al.  Genetic programming for analysis and real-time prediction of coastal algal blooms , 2005 .

[80]  F. Ayala,et al.  Complexity in Ecology and Conservation: Mathematical, Statistical, and Computational Challenges , 2005 .

[81]  Michele Scardi,et al.  Developing an empirical model of phytoplankton primary production: a neural network case study , 1999 .

[82]  A. Prasad,et al.  PREDICTING ABUNDANCE OF 80 TREE SPECIES FOLLOWING CLIMATE CHANGE IN THE EASTERN UNITED STATES , 1998 .

[83]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[84]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[85]  Yannis Dimopoulos,et al.  Use of some sensitivity criteria for choosing networks with good generalization ability , 1995, Neural Processing Letters.

[86]  D. White,et al.  Predicting climate‐induced range shifts: model differences and model reliability , 2006 .

[87]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[88]  James E. McKenna,et al.  Application of Neural Networks to Prediction of Fish Diversity and Salmonid Production in the Lake Ontario Basin , 2005 .

[89]  Robert P. Anderson,et al.  Modeling species’ geographic distributions for preliminary conservation assessments: an implementation with the spiny pocket mice (Heteromys) of Ecuador , 2004 .

[90]  David R. B. Stockwell,et al.  Improving ecological niche models by data mining large environmental datasets for surrogate models , 2005, ArXiv.

[91]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[92]  Jane Elith,et al.  Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines , 2007 .

[93]  C. K. Minns,et al.  Factors Affecting Fish Species Richness in Ontario Lakes , 1989 .

[94]  S. Lek,et al.  The use of artificial neural networks to predict the presence of small‐bodied fish in a river , 1997 .

[95]  Robert I. McKay Variants of genetic programming for species distribution modelling — fitness sharing, partial functions, population evaluation , 2001 .

[96]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.