Learning habitat models for the diatom community in Lake Prespa

Habitat suitability modelling studies the influence of abiotic factors on the abundance or diversity of a given taxonomic group of organisms. In this work, we investigate the effect of the environmental conditions of Lake Prespa (Republic of Macedonia) on diatom communities. The data contain measurements of physical and chemical properties of the environment as well as the relative abundances of 116 diatom taxa. In addition, we create a separate dataset that contains information only about the top 10 most abundant diatoms. We use two machine learning techniques to model the data: regression trees and multi-target regression trees. We learn a regression tree for each taxon separately (from the top 10 most abundant) to identify the environmental conditions that influence the abundance of the given diatom taxon. We learn two multi-target regression trees: one for modelling the complete community and the other for the top 10 most abundant diatoms. The multi-target regression trees approach is able to detect the conditions that affect the structure of a diatom community (as compared to other approaches that can model only a single target variable). We interpret and compare the obtained models. The models present knowledge about the influence of metallic ions and nutrients on the structure of the diatom community, which is consistent with, but further extends existing expert knowledge.

[1]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[2]  Alan Bundy,et al.  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .

[3]  Zorica Svirčev,et al.  Selecting appropriate bioindicators regarding Water Framework Directive guidelines for freshwaters -a Macedonian experience , 2007 .

[4]  Hendrik Blockeel,et al.  Efficient Algorithms for Decision Tree Cross-validation , 2001, J. Mach. Learn. Res..

[5]  Saso Dzeroski,et al.  Constraint Based Induction of Multi-objective Regression Trees , 2005, KDID.

[6]  Brian A. Whitton,et al.  Use of algae for monitoring rivers II , 1996 .

[7]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[8]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[9]  Saso Dzeroski,et al.  Simultaneous Prediction of Mulriple Chemical Parameters of River Water Quality with TILDE , 1999, PKDD.

[10]  P. Gell,et al.  The use of diatoms to assess past and present water quality , 1995 .

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Sue Ellen Haupt,et al.  Artificial Intelligence Methods in the Environmental Sciences , 2008 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  M. Begon,et al.  Ecology: From Individuals to Ecosystems , 2005 .

[15]  F. Round Use of diatoms for monitoring rivers , 1991 .

[16]  J. Sinkeldam,et al.  A coded checklist and ecological indicator values of freshwater diatoms from The Netherlands , 1994, Netherland Journal of Aquatic Ecology.

[17]  B. Chessman,et al.  PREDICTING DIATOM COMMUNITIES AT THE GENUS LEVEL FOR THE RAPID BIOLOGICAL ASSESSMENT OF RIVERS , 1999 .

[18]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[19]  M. Kelly,et al.  Recommendations for the routine sampling of diatoms for water quality assessments in Europe , 1998, Journal of Applied Phycology.

[20]  R. J. Stevenson,et al.  Algal ecology: freshwater benthic ecosystems , 1996 .

[21]  John L. Harper,et al.  Ecology from individuals to ecosystems 4th ed. , 2008 .

[22]  R. Stevenson,et al.  Assessing environmental conditions in rivers and streams with diatoms , 2010 .

[23]  Sašo DŲeroski,et al.  Machine Learning Applications in Habitat Suitability Modeling , 2009 .

[24]  S. Džeroski,et al.  Using multi-objective classification to model communities of soil microarthropods , 2006 .

[25]  Debbie L. Hahs-Vaughn,et al.  Statistical Concepts , 2012 .

[26]  S. Džeroski,et al.  Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition , 2009 .

[27]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[28]  Sašo Džeroski,et al.  Applications of symbolic machine learning to ecological modelling , 2001 .

[29]  Michel Coste,et al.  Field transfer of periphytic diatom communities to assess short-term structural effects of metals (Cd, Zn) in rivers. , 2002, Water research.

[30]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[31]  John P. Smol,et al.  The diatoms: applications for the environmental and earth sciences , 2012 .