A review of supervised machine learning algorithms and their applications to ecological data

In this paper we present a general overview of several supervised machine learning (ML) algorithms and illustrate their use for the prediction of mass mortality events in the coastal rocky benthic communities of the NW Mediterranean Sea. In the first part of the paper we present, in a conceptual way, the general framework of ML and explain the basis of the underlying theory. In the second part we describe some outstanding ML techniques to treat ecological data. In the third part we present our ecological problem and we illustrate exposed ML techniques with our data. Finally, we briefly summarize some extensions of several methods for multi-class output prediction.

[1]  Jose A. Lozano,et al.  Fish recruitment prediction, using robust supervised classification methods , 2010 .

[2]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[3]  Sovan Lek,et al.  Artificial neural networks as a tool in ecological modelling, an introduction , 1999 .

[4]  Rita P. Ribeiro,et al.  A comparative study on predicting algae blooms in Douro River, Portugal , 2008 .

[5]  F. Hüttmann,et al.  A new software system for the PIROP database: data flow and an approach for a seabird-depth analysis , 1997 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  M. Watts,et al.  Determining factors that influence the dispersal of a pelagic species: A comparison between artificial neural networks and evolutionary algorithms , 2011 .

[8]  Sašo Džeroski,et al.  Learning habitat models for the diatom community in Lake Prespa , 2010 .

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  Heping Zhang Classification Trees for Multiple Binary Responses , 1998 .

[11]  Christine A. Ribic,et al.  The relationships of seabird assemblages to physical habitat features in Pacific equatorial waters during spring 1984-1991 , 1997 .

[12]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[13]  E. Ballesteros,et al.  Mediterranean coralligenous assemblages: A synthesis of present knowledge , 2006 .

[14]  Friedrich Recknagel,et al.  Applications of machine learning to ecological modelling , 2001 .

[15]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[16]  John Bell,et al.  Application of classification trees to the habitat preference of upland birds , 1996 .

[17]  Carlo Cerrano,et al.  Mass mortality in Northwestern Mediterranean rocky benthic communities: effects of the 2003 heat wave , 2008 .

[18]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[19]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[21]  Larry L. Irwin,et al.  Winter Habitat Relationships of Pronghorns in Southcentral Wyoming , 1987 .

[22]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[23]  W. Loh,et al.  Generalized regression trees , 1995 .

[24]  Lynne Boddy,et al.  Support vector machines for identifying organisms: a comparison with strongly partitioned radial basis function networks , 2001 .

[25]  G. De’ath MULTIVARIATE REGRESSION TREES: A NEW TECHNIQUE FOR MODELING SPECIES–ENVIRONMENT RELATIONSHIPS , 2002 .

[26]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[27]  S. Džeroski,et al.  Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition , 2009 .

[28]  Peter A. Flach On the state of the art in machine learning: A personal review , 2001, Artif. Intell..

[29]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[30]  Anders Knudby,et al.  New approaches to modelling fish―habitat relationships , 2010 .

[31]  B. Merckx,et al.  Predictability of marine nematode biodiversity , 2009 .

[32]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[33]  Goran Volf,et al.  Descriptive and prediction models of phytoplankton in the northern Adriatic , 2011 .

[34]  Joaquim Garrabou,et al.  High resolution characterization of northwest Mediterranean coastal waters thermal regimes: To better understand responses of benthic communities to climate change , 2010 .

[35]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[36]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[37]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[38]  C. Furlanello,et al.  Predicting habitat suitability with machine learning models: The potential area of Pinus sylvestris L. in the Iberian Peninsula , 2006 .

[39]  P. Picco,et al.  A catastrophic mass‐mortality episode of gorgonians and other organisms in the Ligurian Sea (North‐western Mediterranean), summer 1999 , 2000 .

[40]  Jean-Claude Romano,et al.  Série du Marégraphe de Marseille : mesures de températures de surface de la mer de 1895 à 1956 : une correction , 2010 .

[41]  J. Franklin Predicting the distribution of shrub species in southern California from climate and terrain‐derived variables , 1998 .

[42]  Sašo Džeroski,et al.  Applications of symbolic machine learning to ecological modelling , 2001 .

[43]  Brian J. Taylor,et al.  Methods and Procedures for the Verification and Validation of Artificial Neural Networks , 2005 .

[44]  Einoshin Suzuki,et al.  Decision-tree Induction from Time-series Data Based on a Standard-example Split Test , 2003, ICML.

[45]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[46]  Wanchai Rivepiboon,et al.  Reordering Adaptive Directed Acyclic Graphs for Multiclass Support Vector Machines , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[47]  Badih Ghattas,et al.  Classifying densities using functional regression trees: Applications in oceanology , 2007, Comput. Stat. Data Anal..

[48]  David Paull,et al.  Machine learning of poorly predictable ecological data , 2006 .

[49]  Alan H. Fielding,et al.  Machine Learning Methods for Ecological Applications , 2012, Springer US.

[50]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[51]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[52]  Trevor Hastie,et al.  Generalized linear and generalized additive models in studies of species distributions: setting the scene , 2002 .

[53]  M. Segal Tree-Structured Methods for Longitudinal Data , 1992 .

[54]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[55]  O. Defeo,et al.  Morphodynamics and habitat safety in sandy beaches: life-history adaptations in a supralittoral amphipod , 2005 .

[56]  Joaquim Garrabou,et al.  Temperature Anomalies and Mortality Events in Marine Communities: Insights on Factors behind Differential Mortality Impacts in the NW Mediterranean , 2011, PloS one.

[57]  J. Vacelet,et al.  Mortalité massive d'invertébrés marins : un événement sans précédent en Méditerranée nord-occidentale , 2000 .

[58]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .