Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms.

Constructing accurate and reliable groundwater risk maps provide scientifically prudent and strategic measures for the protection and management of groundwater. The objectives of this paper are to design and validate machine learning based-risk maps using ensemble-based modelling with an integrative approach. We employ the extreme learning machines (ELM), multivariate regression splines (MARS), M5 Tree and support vector regression (SVR) applied in multiple aquifer systems (e.g. unconfined, semi-confined and confined) in the Marand plain, North West Iran, to encapsulate the merits of individual learning algorithms in a final committee-based ANN model. The DRASTIC Vulnerability Index (VI) ranged from 56.7 to 128.1, categorized with no risk, low and moderate vulnerability thresholds. The correlation coefficient (r) and Willmott's Index (d) between NO3 concentrations and VI were 0.64 and 0.314, respectively. To introduce improvements in the original DRASTIC method, the vulnerability indices were adjusted by NO3 concentrations, termed as the groundwater contamination risk (GCR). Seven DRASTIC parameters utilized as the model inputs and GCR values utilized as the outputs of individual machine learning models were served in the fully optimized committee-based ANN-predictive model. The correlation indicators demonstrated that the ELM and SVR models outperformed the MARS and M5 Tree models, by virtue of a larger d and r value. Subsequently, the r and d metrics for the ANN-committee based multi-model in the testing phase were 0.8889 and 0.7913, respectively; revealing the superiority of the integrated (or ensemble) machine learning models when compared with the original DRASTIC approach. The newly designed multi-model ensemble-based approach can be considered as a pragmatic step for mapping groundwater contamination risks of multiple aquifer systems with multi-model techniques, yielding the high accuracy of the ANN committee-based model.

[1]  Gints Jekabsons,et al.  Adaptive Regression Splines toolbox for Matlab/Octave , 2015 .

[2]  Leonard I. Wassenaar,et al.  AQUIFER VULNERABILITY INDEX: A GIS - COMPATIBLE METHOD FOR GROUNDWATER VULNERABILITY MAPPING , 1993 .

[3]  L. S. Sanches Fernandes,et al.  Factor weighting in DRASTIC modeling. , 2015, The Science of the total environment.

[4]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[5]  Fernando António Leal Pacheco,et al.  The multivariate statistical structure of DRASTIC model , 2013 .

[6]  Z. Zeng,et al.  Extreme learning machine for the displacement prediction of landslide under rainfall and reservoir level , 2014, Stochastic Environmental Research and Risk Assessment.

[7]  Jan Adamowski,et al.  Urban water demand forecasting and uncertainty assessment using ensemble wavelet‐bootstrap‐neural network models , 2013 .

[8]  V. Singh,et al.  Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model , 2017 .

[9]  Biswajeet Pradhan,et al.  Groundwater vulnerability assessment using an improved DRASTIC method in GIS , 2014 .

[10]  Asghar Asghari Moghaddam,et al.  Identification of hydrogeochemical processes and pollution sources of groundwater resources in the Marand plain, northwest of Iran , 2017, Environmental Earth Sciences.

[11]  Ranvir Singh,et al.  Study of indices for drought characterization in KBK districts in Orissa (India) , 2008 .

[12]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[13]  J. Vrba,et al.  Guidebook on Mapping Groundwater Vulnerability , 1994 .

[14]  Honghan Chen,et al.  Assessment of groundwater contamination risk using hazard quantification, a modified DRASTIC model and groundwater value, Beijing Plain, China. , 2012, The Science of the total environment.

[15]  Peiyue Li,et al.  Assessment of groundwater vulnerability in the Yinchuan Plain, Northwest China using OREADIC , 2012, Environmental Monitoring and Assessment.

[16]  Mohsen Jalali,et al.  Nitrates leaching from agricultural land in Hamadan, western Iran , 2005 .

[17]  Husam Baalousha,et al.  Assessment of a groundwater quality monitoring network using vulnerability mapping and geostatistics: a case study from Heretaunga Plains, New Zealand. , 2010 .

[18]  Asghar Asghari Moghaddam,et al.  A supervised committee machine artificial intelligent for improving DRASTIC method to assess groundwater contamination risk: a case study from Tabriz plain aquifer, Iran , 2016, Stochastic Environmental Research and Risk Assessment.

[19]  O. Kisi Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree , 2015 .

[20]  Miguel A. Mariño,et al.  Rule-Based Fuzzy System for Assessing Groundwater Vulnerability , 2007 .

[21]  J. Adamowski,et al.  A wavelet neural network conjunction model for groundwater level forecasting , 2011 .

[22]  Anthony T. C. Goh,et al.  Multivariate adaptive regression splines for analysis of geotechnical engineering systems , 2013 .

[23]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[24]  Mehmet Şahin,et al.  Erratum to: An extreme learning machine model for the simulation of monthly mean streamflow water level in eastern Queensland , 2016, Environmental Monitoring and Assessment.

[25]  Rahim Barzegar,et al.  Forecasting of groundwater level fluctuations using ensemble hybrid multi-wavelet neural network-based models. , 2017, The Science of the total environment.

[26]  R. Deo,et al.  Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia , 2015 .

[27]  J. W. Krzy cin,et al.  Nonlinear (MARS) modeling of long-term variations of surface UV-B radiation as revealed from the analysis of Belsk, Poland data for the period 1976?2000 , 2003 .

[28]  R. Deo,et al.  Forecasting effective drought index using a wavelet extreme learning machine (W-ELM) model , 2017, Stochastic Environmental Research and Risk Assessment.

[29]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[30]  Dimitri P. Solomatine,et al.  PRO O F CO PY [ HE / 2002 / 022579 ] 001406 Q HE M 5 Model Trees and Neural Networks : Application to Flood Forecasting in the Upper Reach of the Huai River in China , 2004 .

[31]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[32]  K. Bhattacharya,et al.  Forecasting the hourly Ontario energy price by multivariate adaptive regression splines , 2006, 2006 IEEE Power Engineering Society General Meeting.

[33]  Jan Adamowski,et al.  Comparison of machine learning models for predicting fluoride contamination in groundwater , 2017, Stochastic Environmental Research and Risk Assessment.

[34]  Baojun Zhao,et al.  Visual Tracking Based on Extreme Learning Machine and Sparse Representation , 2015, Sensors.

[35]  N. Lambrakis,et al.  Optimization of the DRASTIC method for groundwater vulnerability assessment via the use of simple statistical methods and GIS , 2006 .

[36]  J. Krzyścin,et al.  Nonlinear (MARS) modeling of long-term variations of surface UV-B radiation as revealed from the analysis of Belsk, Poland data for the period 1976-2000 , 2003 .

[37]  Biswajeet Pradhan,et al.  Risk assessment of groundwater pollution with a new methodological framework: application of Dempster–Shafer theory and GIS , 2015, Natural Hazards.

[38]  E. Tziritis,et al.  Hydrogeochemistry and water quality of the Kordkandi-Duzduzan plain, NW Iran: application of multivariate statistical analysis and PoS index , 2017, Environmental Monitoring and Assessment.

[39]  Yanguo Teng,et al.  Assessment and validation of groundwater vulnerability to nitrate based on a modified DRASTIC model: a case study in Jilin City of northeast China. , 2012, The Science of the total environment.

[40]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[41]  C. Mclay,et al.  Predicting groundwater nitrate concentrations in a region of mixed agricultural land use: a comparison of three approaches. , 2001, Environmental pollution.

[42]  A. Dassargues,et al.  Comparison of aquifer vulnerability assessment techniques. Application to the Néblon river basin (Belgium) , 2003 .

[43]  C. Kooperberg,et al.  Hazard regression with interval-censored data. , 1997, Biometrics.

[44]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[45]  M. Chitsazan,et al.  A GIS-based DRASTIC Model for Assessing Aquifer Vulnerability in Kherran Plain, Khuzestan, Iran , 2009 .

[46]  R. Mirzaei,et al.  Comparison of interpolation methods for the estimation of groundwater contamination in Andimeshk-Shush Plain, Southwest of Iran , 2016, Environmental Science and Pollution Research.

[47]  X. Wen,et al.  A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset , 2016 .

[48]  Barnali M. Dixon,et al.  Optimization of DRASTIC method by supervised committee machine artificial intelligence to assess groundwater vulnerability for Maragheh–Bonab plain aquifer, Iran , 2013 .

[49]  Jan Adamowski,et al.  Development of a coupled wavelet transform and neural network method for flow forecasting of non-perennial rivers in semi-arid watersheds. , 2010 .

[50]  P. Samui Slope Stability Analysis Using Multivariate Adaptive Regression Spline , 2013 .

[51]  B. Scanlon,et al.  Choosing appropriate techniques for quantifying groundwater recharge , 2002 .

[52]  Atiqur Rahman,et al.  A GIS based DRASTIC model for assessing groundwater vulnerability in shallow aquifer in Aligarh, India , 2008 .

[53]  B. Dixon Groundwater vulnerability mapping: A GIS and fuzzy rule based integrated tool , 2005 .

[54]  Mahesh Pal,et al.  M5 model tree based modelling of reference evapotranspiration , 2009 .

[55]  Raffaele Giordano,et al.  A fuzzy knowledge-based decision support system for groundwater pollution risk evaluation. , 2004, Journal of environmental management.

[56]  Barnali M. Dixon,et al.  Applicability of neuro-fuzzy techniques in predicting ground-water vulnerability: a GIS-based sensitivity analysis , 2005 .

[57]  Ali Rahimikhoob,et al.  A Comparison Between Conventional and M5 Model Tree Methods for Converting Pan Evaporation to Reference Evapotranspiration for Semi-Arid Region , 2013, Water Resources Management.

[58]  Shiv O. Prasher,et al.  Comparison of multivariate adaptive regression splines with coupled wavelet transform artificial neural networks for runoff forecasting in Himalayan micro-watersheds with limited data , 2012 .

[59]  Ravinesh C. Deo,et al.  Mapping heatwave vulnerability in Korea , 2017, Natural Hazards.

[60]  A. Dassargues,et al.  Current trends and future challenges in groundwater vulnerability assessment using overlay and index methods , 2000 .

[61]  Pijush Samui,et al.  Determination of ultimate capacity of driven piles in cohesionless soil: A Multivariate Adaptive Regression Spline approach , 2012 .

[62]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[63]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[64]  R. Deo,et al.  Estimation of monthly evaporative loss using relevance vector machine, extreme learning machine and multivariate adaptive regression spline models , 2016, Stochastic Environmental Research and Risk Assessment.

[65]  Rameswar Panda,et al.  Application of neural network and adaptive neuro-fuzzy inference systems for river flow prediction , 2009 .

[66]  R. Deo,et al.  Very short‐term reactive forecasting of the solar ultraviolet index using an extreme learning machine integrated with the solar zenith angle , 2017, Environmental research.

[67]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[68]  J. Adamowski,et al.  Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran , 2016, Stochastic Environmental Research and Risk Assessment.

[69]  Jan Adamowski,et al.  Multi-step water quality forecasting using a boosting ensemble multi-wavelet extreme learning machine model , 2018, Stochastic Environmental Research and Risk Assessment.

[70]  A. Dassargues,et al.  Main concepts of the "European approach" to karst-groundwater-vulnerability assessment and mapping , 2002 .

[71]  Massimo Civita,et al.  Sperimentazione di alcune metodologie per la valutazione della vulnerabilità degli acquiferi. Atti 2° Conv. Naz. , 1995 .

[72]  Jan Adamowski,et al.  Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction , 2016 .

[73]  Konstantinos Voudouris,et al.  Assessing groundwater pollution risk in Sarigkiol basin, NW Greece , 2009 .

[74]  Dimitri P. Solomatine,et al.  M5 Model Trees and Neural Networks: Application to Flood Forecasting in the Upper Reach of the Huai River in China , 2004 .

[75]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[76]  V. Rodriguez-Galiano,et al.  Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). , 2014, The Science of the total environment.

[77]  Omid Rahmati,et al.  Assessment of the Contribution of N-Fertilizers to Nitrate Pollution of Groundwater in Western Iran (Case Study: Ghorveh–Dehgelan Aquifer) , 2015, Water Quality, Exposure and Health.

[78]  Omid Rahmati,et al.  Application of Dempster-Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan, Iran. , 2016, The Science of the total environment.

[79]  Konstantinos Voudouris,et al.  Groundwater vulnerability and pollution risk assessment of porous aquifers to nitrate: Modifying the DRASTIC method using quantitative parameters , 2015 .

[80]  J. Adamowski,et al.  Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada , 2012 .

[81]  H. Byun,et al.  Comparison of drought indices for appraisal of drought characteristics in the Ken River Basin , 2015 .

[82]  Asghari Moghaddam Asghar,et al.  INVESTIGATION OF NITRATE CONCENTRATIONS IN GROUNDWATER RESOURCES OF MARAND PLAIN AND GROUNDWATER VULNERABILITY ASSESSMENT USING AVI AND GODS METHODS , 2015 .

[83]  A. H. Thiessen PRECIPITATION AVERAGES FOR LARGE AREAS , 1911 .

[84]  Rahim Barzegar,et al.  Combining the advantages of neural networks using the concept of committee machine in the groundwater salinity prediction , 2016, Modeling Earth Systems and Environment.

[85]  Paresh Chandra Deka,et al.  Multistep Ahead Groundwater Level Time-Series Forecasting Using Gaussian Process Regression and ANFIS , 2015, ACSS.

[86]  Mohammad Reza Nikoo,et al.  Groundwater risk assessment based on optimization framework using DRASTIC method , 2016, Arabian Journal of Geosciences.

[87]  H. Hassani,et al.  Risk assessment and ranking of heavy metals concentration in Iran’s Rayen groundwater basin using linear assignment method , 2018, Stochastic Environmental Research and Risk Assessment.

[88]  M. Králik,et al.  Time-input, an innovative groundwater-vulnerability assessment scheme: application to an alpine test site , 2003 .

[89]  O. Kolditz,et al.  Development and application of a novel method for regional assessment of groundwater contamination risk in the Songhua River Basin. , 2017, The Science of the total environment.