Improving prediction of water quality indices using novel hybrid machine-learning algorithms.

River water quality assessment is one of the most important tasks to enhance water resources management plans. A water quality index (WQI) considers several water quality variables simultaneously. Traditionally WQI calculations consume time and are often fraught with errors during derivations of sub-indices. In this study, 4 standalone (random forest (RF), M5P, random tree (RT), and reduced error pruning tree (REPT)) and 12 hybrid data-mining algorithms (combinations of standalones with bagging (BA), CV parameter selection (CVPS) and randomizable filtered classification (RFC)) were used to create Iran WQI (IRWQIsc) predictions. Six years (2012 to 2018) of monthly data from two water quality monitoring stations within the Talar catchment were compiled. Using Pearson correlation coefficients, 10 different input combinations were constructed. The data were divided into two groups (ratio 70:30) for model building (training dataset) and model validation (testing dataset) using a 10-fold cross-validation technique. The models were evaluated using several statistical and visual evaluation metrics. Result show that fecal coliform (FC) and total solids (TS) had the greatest and least effect on the prediction of IRWQIsc. The best input combinations varied among the algorithms; generally variables with very low correlations displayed weaker performance. Hybrid algorithms improved the prediction power of several of the standalone models, but not all. Hybrid BA-RT outperformed the other models (R2 = 0.941, RMSE = 2.71, MAE = 1.87, NSE = 0.941, PBIAS = 0.500). PBIAS indicated that all algorithms, with the exceptions of RT, BA-RT and CVPS-REPT, overestimated WQI values.

[1]  T. Darrah,et al.  Evaluating the suitability of urban groundwater resources for drinking water and irrigation purposes: an integrated approach in the Agro-Aversano area of Southern Italy , 2019, Environmental Monitoring and Assessment.

[2]  R. Zamani-Ahmadmahmoodi,et al.  Water quality evaluation using water quality index and multivariate methods, Beheshtabad River, Iran , 2018, Applied Water Science.

[3]  Hamid Darabi,et al.  River suspended sediment modelling using the CART model: A comparative study of machine learning techniques. , 2018, The Science of the total environment.

[4]  Alireza Bahadori,et al.  Prediction of water quality index (WQI) using support vector machine (SVM) and least square-support vector machine (LS-SVM) , 2019, International Journal of River Basin Management.

[5]  Konstantinos Voudouris,et al.  Water allocation and governance in multi-stakeholder environments: Insight from Axios Delta, Greece. , 2019, The Science of the total environment.

[6]  Faridah Othman,et al.  Trend analysis of a tropical urban river water quality in Malaysia. , 2012, Journal of environmental monitoring : JEM.

[7]  N. Kazakis,et al.  A novel hybrid method of specific vulnerability to anthropogenic pollution using multivariate statistical and regression analyses. , 2019, Water research.

[8]  Zaher Mundher Yaseen,et al.  Determination of compound channel apparent shear stress: application of novel data mining models , 2019, Journal of Hydroinformatics.

[9]  K. Khosravi,et al.  Landslide prediction capability by comparison of frequency ratio, fuzzy gamma and landslide index method , 2019, Journal of Earth System Science.

[10]  N. Nakagoshi,et al.  Effects of seasonality on streamflow and water quality of the Pinang River in Penang Island, Malaysia , 2004 .

[11]  Gagandeep Kaur,et al.  Performance Evaluation of Two ANFIS Models for Predicting Water Quality Index of River Satluj (India) , 2018 .

[12]  Ahmad Sharafati,et al.  The potential of novel data mining models for global solar radiation prediction , 2019, International Journal of Environmental Science and Technology.

[13]  B. Pradhan,et al.  A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods , 2019, Journal of Hydrology.

[14]  Jason Papathanasiou,et al.  Support of irrigation water use and eco-friendly decision process in agricultural production planning , 2015, Oper. Res..

[15]  Wen-Cheng Liu,et al.  Water Quality Modeling in Reservoirs Using Multivariate Linear Regression and Two Neural Network Models , 2015, Adv. Artif. Neural Syst..

[16]  W. H. M. Wan Mohtar,et al.  Spatial and temporal risk quotient based river assessment for water resources management. , 2019, Environmental pollution.

[17]  Ozgur Kisi,et al.  Comparison of Two Different Adaptive Neuro-Fuzzy Inference Systems in Modelling Daily Reference Evapotranspiration , 2014, Water Resources Management.

[18]  Dieu Tien Bui,et al.  A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling , 2018, Geocarto International.

[19]  V. Singh,et al.  New Hybrids of ANFIS with Several Optimization Algorithms for Flood Susceptibility Modeling , 2018, Water.

[20]  Baden Myers,et al.  Introducing a water quality index for assessing water for irrigation purposes: A case study of the Ghezel Ozan River. , 2017, The Science of the total environment.

[21]  S. M. Fischer,et al.  Temporal distributions of problem behavior based on scatter plot analysis. , 1998, Journal of Applied Behavior Analysis.

[22]  Seyed Vahid Razavi Termeh,et al.  Optimización de un sistema de inferencia neuro-fuzzy adaptable para el mapeo del potencial de aguas subterráneas , 2019 .

[23]  Rozita Jailani,et al.  Prediction of water quality index (WQI) based on artificial neural network (ANN) , 2002, Student Conference on Research and Development.

[24]  Mohammad Firuz Ramli,et al.  Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. , 2012, Marine pollution bulletin.

[25]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[26]  Wei Chen,et al.  Spatial prediction of groundwater potentiality using ANFIS ensembled with teaching-learning-based and biogeography-based optimization , 2019, Journal of Hydrology.

[27]  Chun Kiat Chang,et al.  Prediction of water quality index in constructed wetlands using support vector machine , 2015, Environmental Science and Pollution Research.

[28]  P. Touchette,et al.  A scatter plot for identifying stimulus control of problem behavior. , 1985, Journal of applied behavior analysis.

[29]  H. Pourghasemi,et al.  A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique , 2016, Natural Hazards.

[30]  Zaher Mundher Yaseen,et al.  Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical region, Malaysia , 2017, Neural Computing and Applications.

[31]  Zaher Mundher Yaseen,et al.  Meteorological data mining and hybrid data-intelligence models for reference evaporation simulation: A case study in Iraq , 2019, Comput. Electron. Agric..

[32]  M Nakhaei,et al.  A fuzzy-logic based decision-making approach for identification of groundwater quality based on groundwater quality indices. , 2016, Journal of environmental management.

[33]  Jan Adamowski,et al.  Comparison of machine learning models for predicting fluoride contamination in groundwater , 2017, Stochastic Environmental Research and Risk Assessment.

[34]  Abbas Parsaie,et al.  Water quality prediction using machine learning methods , 2018 .

[35]  N. Kazakis,et al.  A review of GIS-integrated statistical techniques for groundwater quality evaluation and protection , 2018, Environmental Earth Sciences.

[36]  Jan Adamowski,et al.  Stochastic Modeling of Groundwater Fluoride Contamination: Introducing Lazy Learners , 2019, Ground water.

[37]  Biswajeet Pradhan,et al.  Novel Hybrid Integration Approach of Bagging-Based Fisher’s Linear Discriminant Function for Groundwater Potential Analysis , 2019, Natural Resources Research.

[38]  Zaher Mundher Yaseen,et al.  Quantifying hourly suspended sediment load using data mining models: Case study of a glacierized Andean catchment in Chile , 2018, Journal of Hydrology.

[39]  R. Deo,et al.  Implementation of a hybrid MLP-FFA model for water level prediction of Lake Egirdir, Turkey , 2018, Stochastic Environmental Research and Risk Assessment.

[40]  Zaher Mundher Yaseen,et al.  Hybrid Adaptive Neuro-Fuzzy Models for Water Quality Index Estimation , 2018, Water Resources Management.

[41]  Mohamad Sakizadeh,et al.  Artificial intelligence for the prediction of water quality index in groundwater systems , 2016, Modeling Earth Systems and Environment.

[42]  Jing Li,et al.  Hybrid soft computing approach for determining water quality indicator: Euphrates River , 2017, Neural Computing and Applications.

[43]  V. Singh,et al.  Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods , 2018, Scientific Reports.

[44]  Jeffrey G. Arnold,et al.  Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations , 2007 .

[45]  Biswajeet Pradhan,et al.  Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS , 2016 .

[46]  Jeffrey G. Arnold,et al.  CUMULATIVE UNCERTAINTY IN MEASURED STREAMFLOW AND WATER QUALITY DATA FOR SMALL WATERSHEDS , 2006 .

[47]  Syed Mustakim Ali Shah,et al.  Application of adaptive neuro-fuzzy inference system (ANFIS) to estimate the biochemical oxygen demand (BOD) of Surma River , 2017 .

[48]  O. Kisi,et al.  Suspended sediment modeling using genetic programming and soft computing techniques , 2012 .

[49]  Mrunmayee Manjari Sahoo,et al.  Inference of Water Quality Index Using ANFIA and PCA , 2015 .