Comparison of modelling techniques to predict macroinvertebrate community composition in rivers of Ethiopia

Abstract In order to fulfil the millennium development goals and to ensure environmental sustainability in Ethiopia, ecological indicator systems can support river managers to analyse the status of watercourses and to select critical restoration actions. In order to use macroinvertebrates as river water quality monitoring and assessment tools, Ethiopia needs data from reference as well as disturbed conditions of surface water ecosystems. Macroinvertebrates, structural and physical–chemical data were in this context collected in the Gilgel Gibe river basin in South-Western Ethiopia during the period 2005–2008. In the next stage, ecological metrics were compared for their assessment relevance. In the present paper, classification trees and support vector machines were used to induce models describing the relation between the river characteristics and the ecological conditions of these streams. Greedy stepwise and genetic search algorithms improved the performance and easy interpretation of these models by making a selection of the variables that were used as input of these models. The developed models allowed to identify the major variables affecting river quality. These tools can support river managers in their decision-making regarding the status of rivers and potential restoration options, for example by providing rules concerning critical values of major river characteristics at which certain actions should be undertaken.

[1]  A. E. Greenberg,et al.  Standard methods for the examination of water and wastewater : supplement to the sixteenth edition , 1988 .

[2]  Christopher J. C. Burges,et al.  Geometry and invariance in kernel based methods , 1999 .

[3]  A. Spacie,et al.  Biological Monitoring of Aquatic Systems , 1994 .

[4]  Sovan Lek,et al.  Applications of artificial neural networks predicting macroinvertebrates in freshwaters , 2007, Aquatic Ecology.

[5]  V. Resh Multinational, Freshwater Biomonitoring Programs in the Developing World: Lessons Learned from African and Southeast Asian River Surveys , 2007, Environmental management.

[6]  Michael Obach,et al.  Artificial neural nets and abundance prediction of aquatic insects in small streams , 2006, Ecol. Informatics.

[7]  C.J.F. ter Braak,et al.  Predicting macro-fauna community types from environmental variables by means of support vector machines , 2005 .

[8]  Peter Goethals,et al.  Development and Application of Predictive River Ecosystem Models Based on Classification Trees and Artificial Neural Networks , 2003 .

[9]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[11]  M. Barbour,et al.  Rapid bioassessment protocols for use in streams and wadeable rivers: periphyton , 1999 .

[12]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[14]  Sovan Lek,et al.  Analysis of macrobenthic communities in Flanders, Belgium, using a stepwise input variable selection procedure with artificial neural networks , 2007, Aquatic Ecology.

[15]  Pier Francesco Ghetti,et al.  European perspective on biological monitoring , 1994 .

[16]  Melissa Parsons,et al.  Development of a Standardised Approach to River Habitat Assessment in Australia , 2004, Environmental monitoring and assessment.

[17]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[18]  Michael D. Vose,et al.  The simple genetic algorithm - foundations and theory , 1999, Complex adaptive systems.

[19]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[20]  Wayne Niblack,et al.  Feature selection with stochastic complexity , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  D. M. Rosenberg,et al.  Freshwater biomonitoring and benthic macroinvertebrates. , 1994 .

[22]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[23]  Saso Dzeroski,et al.  Predicting Chemical Parameters of River Water Quality from Bioindicator Data , 2000, Applied Intelligence.

[24]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[25]  P. Goethals,et al.  Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates , 2003 .

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[28]  Andy P. Dedecker,et al.  Decision Tree Models for Prediction of Macroinvertebrate Taxa in the River Axios (Northern Greece) , 2007, Aquatic Ecology.

[29]  Peter Goethals,et al.  Genetic algorithms for optimisation of predictive ecosystems models based on decision trees and neural networks , 2006 .

[30]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[31]  D. R. Cutler,et al.  Effects of sample survey design on the accuracy of classification tree models in species distribution models , 2006 .

[32]  Peter Goethals,et al.  Multimetric Macroinvertebrate Index Flanders (MMIF) for biological assessment of rivers and lakes in Flanders (Belgium) , 2010 .

[33]  Toshihide Ibaraki,et al.  Finding Essential Attributes from Binary Data , 2003, Annals of Mathematics and Artificial Intelligence.

[34]  Martin Welp The Use of Decision Support Tools in Participatory River Basin Management , 2001 .

[35]  Lucila Ohno-Machado,et al.  A greedy algorithm for supervised discretization , 2004, J. Biomed. Informatics.

[36]  Dimitri P. Solomatine,et al.  Model Induction with Support Vector Machines: Introduction and Applications , 2001 .

[37]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[38]  Saso Dzeroski,et al.  Simultaneous Prediction of Mulriple Chemical Parameters of River Water Quality with TILDE , 1999, PKDD.

[39]  M. Gevrey,et al.  Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[40]  Sovan Lek,et al.  Application Of Artificial Neural Network Models To Analyse The Relationships Between Gammarus pulex L. (Crustacea, Amphipoda) And River Characteristics , 2005, Environmental monitoring and assessment.