Comparison of models for predicting the changes in phytoplankton community composition in the receiving water system of an inter-basin water transfer project.

Inter-basin water transfer projects might cause complex hydro-chemical and biological variation in the receiving aquatic ecosystems. Whether machine learning models can be used to predict changes in phytoplankton community composition caused by water transfer projects have rarely been studied. In the present study, we used machine learning models to predict the total algal cell densities and changes in phytoplankton community composition in Miyun reservoir caused by the middle route of the South-to-North Water Transfer Project (SNWTP). The model performances of four machine learning models, including regression trees (RT), random forest (RF), support vector machine (SVM), and artificial neural network (ANN) were evaluated and the best model was selected for further prediction. The results showed that the predictive accuracies (Pearson's correlation coefficient) of the models were RF (0.974), ANN (0.951), SVM (0.860), and RT (0.817) in the training step and RF (0.806), ANN (0.734), SVM (0.730), and RT (0.692) in the testing step. Therefore, the RF model was the best method for estimating total algal cell densities. Furthermore, the predicted accuracies of the RF model for dominant phytoplankton phyla (Cyanophyta, Chlorophyta, and Bacillariophyta) in Miyun reservoir ranged from 0.824 to 0.869 in the testing step. The predicted proportions with water transfer of the different phytoplankton phyla ranged from -8.88% to 9.93%, and the predicted dominant phyla with water transfer in each season remained unchanged compared to the phytoplankton succession without water transfer. The results of the present study provide a useful tool for predicting the changes in phytoplankton community caused by water transfer. The method is transferrable to other locations via establishment of models with relevant data to a particular area. Our findings help better understanding the possible changes in aquatic ecosystems influenced by inter-basin water transfer.

[1]  Yao Zhu,et al.  Assessment for surface water quality in Lake Taihu Tiaoxi River Basin China based on support vector machine , 2013, Stochastic Environmental Research and Risk Assessment.

[2]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[3]  P. J. García Nieto,et al.  Support Vector Machines and Multilayer Perceptron Networks Used to Evaluate the Cyanotoxins Presence from Experimental Cyanobacteria Concentrations in the Trasona Reservoir (Northern Spain) , 2013 .

[4]  K. Lee,et al.  A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer , 2011 .

[5]  T. Maekawa,et al.  Use of artificial neural network in the prediction of algal blooms. , 2001, Water research.

[6]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[7]  Alistair Grinham,et al.  Random forest algorithm yields accurate quantitative prediction models of benthic light at intertidal sites affected by toxic Lyngbya majuscula blooms , 2012 .

[8]  Phytoplankton functional and morpho-functional approach in large floodplain rivers , 2012, Hydrobiologia.

[9]  C. Reynolds The Ecology of Phytoplankton , 2006 .

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  Irena F. Creed,et al.  Interannual variability in trophic status of shallow lakes on the Boreal Plain: Is there a climate signal? , 2008 .

[12]  R W Dawson,et al.  Lake ecosystem health assessment: indicators and methods. , 2001, Water research.

[13]  Chen Yang,et al.  The seasonal and spatial variations of phytoplankton community and their correlation with environmental factors in a large eutrophic Chinese lake (Lake Chaohu) , 2014 .

[14]  Jian-yu Xu,et al.  Application of biomonitoring and support vector machine in water quality assessment , 2012, Journal of Zhejiang University SCIENCE B.

[15]  J. Huisman,et al.  Summer heatwaves promote blooms of harmful cyanobacteria , 2008 .

[16]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[17]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[18]  M. Hosomi,et al.  Novel application of a back-propagation artificial neural network model formulated to predict algal bloom , 1997 .

[19]  MohammadSajjad Khan,et al.  Application of Support Vector Machine in Lake Water Level Prediction , 2006 .

[20]  J. Yu,et al.  Spatial and temporal variations of two cyanobacteria in the mesotrophic Miyun reservoir, China. , 2014, Journal of environmental sciences.

[21]  Lihuan Qin,et al.  The potential impact of an inter-basin water transfer project on nutrients (nitrogen and phosphorous) and chlorophyll a of the receiving water system. , 2015, The Science of the total environment.

[22]  J. Lund,et al.  The inverted microscope method of estimating algal numbers and the statistical basis of estimations by counting , 1958, Hydrobiologia.

[23]  Guangren Qian,et al.  Method to predict key factors affecting lake eutrophication:a new approach based on Support Vector Regression model , 2015 .

[24]  Samantha Jane Hughes,et al.  Tools for bioindicator assessment in rivers: The importance of spatial scale, land use patterns and biotic integration , 2013 .

[25]  Anthony E. Walsby,et al.  Cyanobacterial dominance: the role of buoyancy regulation in dynamic lake environments , 1987 .

[26]  K. Safi,et al.  Nitrogen and carbon limitation of planktonic primary production and phytoplankton–bacterioplankton coupling in ponds on the McMurdo Ice Shelf, Antarctica , 2013 .

[27]  A. M. Mccombie Factors Influencing the Growth of Phytoplankton , 1953 .

[28]  B. Ibelings,et al.  Artificial mixing prevents nuisance blooms of the cyanobacterium Microcystis in Lake Nieuwe Meer, the Netherlands , 1996 .

[29]  Parthasarathy Ramachandran,et al.  A Comparison of Machine Learning Techniques for Modeling River Flow Time Series: The Case of Upper Cauvery River Basin , 2014, Water Resources Management.

[30]  E. Welch,et al.  Lake restoration by dilution: Moses lake, Washington , 1980 .

[31]  Jan-Tai Kuo,et al.  A hybrid neural-genetic algorithm for reservoir water quality management. , 2006, Water research.

[32]  Mike Jeffries,et al.  Impacts of an inter-basin water transfer: Distribution and abundance of Micronecta poweri (Insecta: Corixidae) in the River Wear, north-east England , 2000 .

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  Shenglian Guo,et al.  Comparative study of monthly inflow prediction methods for the Three Gorges Reservoir , 2014, Stochastic Environmental Research and Risk Assessment.

[35]  Paul L. G. Vlek,et al.  Environmental correlation of three-dimensional soil spatial variability: a comparison of three adaptive techniques , 2002 .

[36]  Jiunn‐Tzong Wu,et al.  Alteration of phytoplankton assemblages caused by changes in water hardness in Feitsui Reservoir, Taiwan , 2010 .

[37]  D. L. Scarnecchia,et al.  Fundamentals of Ecological Modelling , 1995 .

[38]  K. P. Singh,et al.  Support vector machines in water quality management. , 2011, Analytica chimica acta.

[39]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[40]  Yan Huang,et al.  Neural network modelling of coastal algal blooms , 2003 .

[41]  Hongxing Zheng,et al.  South-to-north Water Transfer Schemes for China , 2002 .

[42]  M. Dokulil,et al.  Cyanobacterial dominance in lakes , 2000, Hydrobiologia.

[43]  Joon Ha Kim,et al.  Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. , 2015, The Science of the total environment.

[44]  David Mouillot,et al.  Cost effective prediction of the eutrophication status of lakes and reservoirs , 2010 .

[45]  Steve Horvath,et al.  Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma , 2005, Modern Pathology.

[46]  G. De’ath MULTIVARIATE REGRESSION TREES: A NEW TECHNIQUE FOR MODELING SPECIES–ENVIRONMENT RELATIONSHIPS , 2002 .

[47]  Vivi Fleming-Lehtinen,et al.  Long-term changes in Secchi depth and the role of phytoplankton in explaining light attenuation in the Baltic Sea , 2012 .

[48]  F. Recknagel ANNA – Artificial Neural Network model for predicting species abundance and succession of blue-green algae , 1997, Hydrobiologia.

[49]  S. Soyupak,et al.  Case studies on the use of neural networks in eutrophication modeling , 2000 .

[50]  Handan Çamdevýren,et al.  Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs , 2005 .

[51]  K. Chun,et al.  Who Smells? Forecasting Taste and Odor in a Drinking Water Reservoir. , 2015, Environmental science & technology.

[52]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[53]  Lirong Song,et al.  Heterogeneity of buoyancy in response to light between two buoyant types of cyanobacterium Microcystis , 2011, Hydrobiologia.

[54]  Gert R. G. Lanckriet,et al.  A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers , 2014, Physiological measurement.

[55]  Jan-Tai Kuo,et al.  USING ARTIFICIAL NEURAL NETWORK FOR RESERVOIR EUTROPHICATION PREDICTION , 2007 .

[56]  A. Walsby,et al.  Changes in buoyancy of a planktonic blue-green alga in response to light intensity , 1980 .

[57]  Hongyin Han,et al.  Spatial and temporal patterns of the water quality in the Danjiangkou Reservoir, China , 2009 .

[58]  Yakov A. Pachepsky,et al.  Stressor–response modeling using the 2D water quality model and regression trees to predict chlorophyll-a in a reservoir system , 2015 .

[59]  Anna-Kristina Brunberg,et al.  THE IMPORTANCE OF SHALLOW SEDIMENTS IN THE RECRUITMENT OF ANABAENA AND APHANIZOMENON (CYANOPHYCEAE) 1 , 2004 .

[60]  A. Chilingarian,et al.  Implementation of the Random Forest method for the Imaging Atmospheric Cherenkov Telescope MAGIC , 2007, 0709.3719.

[61]  Christoffer Boström,et al.  Evaluating eutrophication management scenarios in the Baltic Sea using species distribution modelling , 2013 .

[62]  Mei Liu,et al.  Support vector machine―an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river? , 2014, Environmental Science and Pollution Research.

[63]  Gulay Tezel,et al.  Estimation of the Change in Lake Water Level by Artificial Intelligence Methods , 2014, Water Resources Management.

[64]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.