Prediction of Chlorophyll-a Concentrations in the Nakdong River Using Machine Learning Methods

Many studies have attempted to predict chlorophyll-a concentrations using multiple regression models and validating them with a hold-out technique. In this study commonly used machine learning models, such as Support Vector Regression, Bagging, Random Forest, Extreme Gradient Boosting (XGBoost), Recurrent Neural Network (RNN), and Long–Short-Term Memory (LSTM), are used to build a new model to predict chlorophyll-a concentrations in the Nakdong River, Korea. We employed 1–step ahead recursive prediction to reflect the characteristics of the time series data. In order to increase the prediction accuracy, the model construction was based on forward variable selection. The fitted models were validated by means of cumulative learning and rolling window learning, as opposed to the hold–out technique. The best results were obtained when the chlorophyll-a concentration was predicted by combining the RNN model with the rolling window learning method. The results suggest that the selection of explanatory variables and 1–step ahead recursive prediction in the machine learning model are important processes for improving its prediction performance.

[1]  Dahai Zhang,et al.  A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost , 2018, IEEE Access.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Seok Soon Park,et al.  Factors affecting algal blooms in a man-made lake and prediction using an artificial neural network , 2014 .

[4]  Wenrui Huang,et al.  Neural network modeling of salinity variation in Apalachicola River. , 2002, Water research.

[5]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[6]  Yi Liu,et al.  Comparison of models for predicting the changes in phytoplankton community composition in the receiving water system of an inter-basin water transfer project. , 2017, Environmental pollution.

[7]  I. Dimopoulos,et al.  Application of neural networks to modelling nonlinear relationships in ecology , 1996 .

[8]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[9]  Xiang Li,et al.  Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. , 2017, Environmental pollution.

[10]  Young-Oh Kim,et al.  Rainfall‐runoff models using artificial neural networks for ensemble streamflow prediction , 2005 .

[11]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[12]  Marc A. Rosen,et al.  Application of sliding window technique for prediction of wind velocity time series , 2014 .

[13]  Chu Thai Hoanh,et al.  Hydraulic and water quality modeling: a tool for managing land use conflicts in inland coastal zones , 2009 .

[14]  R. Valentini,et al.  A new assessment of European forests carbon exchanges by eddy fluxes and artificial neural network spatialization , 2003 .

[15]  Rezaul K. Chowdhury,et al.  A Comparative Assessment of Variable Selection Methods in Urban Water Demand Forecasting , 2018 .

[16]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Hiroshi Yajima,et al.  Application of the Random Forest model for chlorophyll-a forecasts in fresh and brackish water bodies in Japan, using multivariate long-term databases , 2018 .

[18]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[19]  George Vellidis,et al.  Mathematical Simulation Tools for Developing Dissolved Oxygen TMDLs , 2006 .

[20]  F. Recknagel,et al.  Artificial neural network approach for modelling and prediction of algal blooms , 1997 .

[21]  Joseph N. Boyer,et al.  Phytoplankton bloom status: Chlorophyll a biomass as an indicator of water quality condition in the southern estuaries of Florida, USA , 2009 .

[22]  Friedrich Recknagel,et al.  Prediction and elucidation of phytoplankton dynamics in the Nakdong River (Korea) by means of a recurrent artificial neural network , 2001 .

[23]  Sangmok Lee,et al.  Improved Prediction of Harmful Algal Blooms in Four Major South Korea’s Rivers Using Deep Learning Models , 2018, International journal of environmental research and public health.

[24]  Zhenhong Du,et al.  Deep-Learning-Based Approach for Prediction of Algal Blooms , 2016 .

[25]  Young-Seuk Park,et al.  Patternizing communities by using an artificial neural network , 1996 .

[26]  Frank T.-C. Tsai,et al.  Bayesian set pair analysis and machine learning based ensemble surrogates for optimal multi-aquifer system remediation design , 2020 .

[27]  Dong-Kyun Kim,et al.  River phytoplankton prediction model by Artificial Neural Network: Model performance and selection of input variables to predict time-series phytoplankton proliferations in a regulated river system , 2006, Ecological Informatics.

[28]  Joon Ha Kim,et al.  Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. , 2015, The Science of the total environment.

[29]  C. Sutton Classification and Regression Trees, Bagging, and Boosting , 2005 .

[30]  Claudia Piccini,et al.  Increased sampled volume improves Microcystis aeruginosa complex (MAC) colonies detection and prediction using Random Forests , 2017 .

[31]  V. Uddameri,et al.  Tree-Based Modeling Methods to Predict Nitrate Exceedances in the Ogallala Aquifer in Texas , 2020, Water.

[32]  Md. Ashad Alam,et al.  Prediction of Algal Chlorophyll-a and Water Clarity in Monsoon-Region Reservoir Using Machine Learning Approaches , 2019, Water.

[33]  Il-Kyu Kim,et al.  Analysis of Water Quality factor and Correlation between Water Quality and Chl-a in Middle and Downstream Weir Section of Nakdong River , 2017 .

[34]  Huaicheng Guo,et al.  A hybrid neural network model for cyanobacteria bloom in Dianchi Lake , 2010 .

[35]  Gary R. Weckman,et al.  Using artificial intelligence for CyanoHAB niche modeling: discovery and visualization of Microcystis–environmental associations within western Lake Erie , 2014 .

[36]  Kwang-Hyeon Chang,et al.  Plankton Community in Weir Section of the Nakdong River and Its Relation with Selected Environmental Factors , 2013 .

[37]  Nitin Muttil,et al.  Machine-learning paradigms for selecting ecologically significant input variables , 2007, Eng. Appl. Artif. Intell..

[38]  Sunil Kumar,et al.  Potential habitat distribution for the freshwater diatom Didymosphenia geminata in the continental US , 2009 .

[39]  Friedrich Recknagel,et al.  Applications of machine learning to ecological modelling , 2001 .

[40]  Mathias Bourel,et al.  Consensus methods based on machine learning techniques for marine phytoplankton presence-absence prediction , 2017, Ecol. Informatics.

[41]  Jian Sha,et al.  Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake , 2018, Environmental Science and Pollution Research.