A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning.

Predicting recreational water quality is one of the most difficult tasks in water management with major implications for humans and society. Many data-driven models have been used to predict water quality indicators to allow a real time assessment of public health risk. This assessment is most commonly based on Faecal Indicator Bacteria (FIB), with the value of FIB compared with thresholds published in guidelines. However, FIB values usually tend to be unbalanced within water quality datasets, with small proportions of data exceeding guideline thresholds and far larger numbers that do not. This can be a limiting factor in the uptake of model predictions since, even if the overall accuracy is high, the sensitivity of the predictions can be low. To address this issue, this paper proposes an adaptive synthetic sampling algorithm (ADASYN) to generate synthetic above-threshold FIB instances and test the validity of the approach for the prediction of recreational water quality. The models in this paper are based on four machine learning techniques: k-mean nearest neighbour, boosting decision tree, support vector machine, and multi-layer perceptron artificial neural network and are applied to five different locations in Auckland, New Zealand. Aside from support vector machine, all models provide favourable predictions with relatively high sensitivity (around 75%) and overall accuracy (over 90%), indicating that both the compliant and exceedance conditions can be effectively predicted through the use of more sophisticated model training which involves artificial data. Considering the model accuracy and stability, boosting decision trees (BDT) and multi-layer perceptron artificial neural (MLP-ANN) network are the best two models and the multi-layer perceptron is the most efficient with the shortest computation time.

[1]  Abbas Parsaie,et al.  Water quality prediction using machine learning methods , 2018 .

[2]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[3]  Mahsa Shoaran,et al.  Energy-Efficient Classification for Resource-Constrained Biomedical Applications , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[4]  S N Chan,et al.  Real-time forecasting of Hong Kong beach water quality by 3D deterministic model. , 2013, Water research.

[5]  Srinivasa Lingireddy,et al.  Backfilling missing microbial concentrations in a riverine database using artificial neural networks. , 2007, Water research.

[6]  M. Scholz,et al.  Assessing Storm Water Detention Systems Treating Road Runoff with an Artificial Neural Network Predicting Fecal Indicator Organisms , 2010 .

[7]  A. Boehm,et al.  Predicting water quality at Santa Monica Beach: evaluation of five different models for public notification of unsafe swimming conditions. , 2014, Water research.

[8]  Anjana Gosain,et al.  Handling class imbalance problem using oversampling techniques: A review , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[9]  Pierre Servais,et al.  Modeling Fecal Indicator Bacteria Concentrations in Natural Surface Waters: A Review , 2014 .

[10]  Boualem Hadjerioua,et al.  Hydropower Optimization Using Artificial Neural Network Surrogate Models of a High‐Fidelity Hydrodynamics and Water Quality Model , 2017 .

[11]  Francisco Herrera,et al.  Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling , 2011, Soft Comput..

[12]  Joseph H. W. Lee,et al.  Daily prediction of marine beach water quality in Hong Kong , 2012 .

[13]  Zhenli He,et al.  Water quality prediction of marine recreational beaches receiving watershed baseflow and stormwater runoff in southern California, USA. , 2008, Water research.

[14]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[15]  G. Esposito,et al.  Machine Learning Algorithms for the Forecasting of Wastewater Quality Indicators , 2017 .

[16]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[17]  Z. Deng,et al.  Modeling Fecal Coliform Bacteria Levels at Gulf Coast Beaches , 2015, Water Quality, Exposure and Health.

[18]  R. T. Stidson,et al.  Development and use of modelling techniques for real‐time bathing water quality predictions , 2012 .

[19]  Andrés García,et al.  Artificial neural networks as emulators of process-based models to analyse bathing water quality in estuaries. , 2019, Water research.

[20]  C R Sterling,et al.  Waterborne protozoan pathogens , 1997, Clinical microbiology reviews.

[21]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[22]  Mohd Yawar Ali Khan,et al.  CART and PSO+KNN algorithms to estimate the impact of water level change on water quality in Poyang Lake, China , 2019, Arabian Journal of Geosciences.

[23]  B. Pijanowski,et al.  Using neural networks and GIS to forecast land use changes: a Land Transformation Model , 2002 .

[24]  Dominic L Boccelli,et al.  Development of a neural-based forecasting tool to classify recreational water quality using fecal indicator organisms. , 2012, Water research.

[25]  H. Z. Abyaneh,et al.  Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters. , 2014 .

[26]  Joseph H. W. Lee,et al.  Daily Forecasting of Hong Kong Beach Water Quality by Multiple Linear Regression Models , 2014 .

[27]  Andrea Castelletti,et al.  Planning the Optimal Operation of a Multioutlet Water Reservoir with Water Quality and Quantity Targets , 2014 .

[28]  Soroosh Sorooshian,et al.  Classification and regression tree (CART) analysis for indicator bacterial concentration prediction for a Californian coastal area. , 2010, Water science and technology : a journal of the International Association on Water Pollution Research.

[29]  Lindell Ormsbee,et al.  Artificial Intelligence-Based Inductive Models for Prediction and Classification of Fecal Coliform in Surface Waters , 2008 .

[30]  Sakshi Babbar,et al.  Predicting river water quality index using data mining techniques , 2017, Environmental Earth Sciences.

[31]  Curtis J. Richardson,et al.  Integrating Bioassessment and Ecological Risk Assessment: An Approach to Developing Numerical Water-Quality Criteria , 2003, Environmental management.

[32]  S. Ha,et al.  Machine learning approaches to coastal water quality monitoring using GOCI satellite data , 2014 .

[33]  R. Prakash,et al.  A Comparative Study of Various Classification Techniques to Determine Water Quality , 2018, 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT).

[34]  Furong Gao,et al.  Wastewater quality monitoring system using sensor fusion and machine learning techniques. , 2012, Water research.

[35]  Wen-Cheng Liu,et al.  Water Quality Modeling in Reservoirs Using Multivariate Linear Regression and Two Neural Network Models , 2015, Adv. Artif. Neural Syst..

[36]  Vladan Babovic,et al.  Improving real-time forecasting of water quality indicators with combination of process-based models and data assimilation technique , 2016 .

[37]  François Anctil,et al.  Impact of the length of observed records on the performance of ANN and of conceptual parsimonious rainfall-runoff forecasting models , 2004, Environ. Model. Softw..

[38]  Alfred P. Dufour,et al.  High Sensitivity of Children to Swimming-Associated Gastrointestinal Illness: Results Using a Rapid Assay of Recreational Water Quality , 2008, Epidemiology.

[39]  Louise A. Deering,et al.  Evaluating a microbial water quality prediction model for beach management under the revised EU Bathing Water Directive. , 2016, Journal of environmental management.