Improving GALDIT-based groundwater vulnerability predictive mapping using coupled resampling algorithms and machine learning models

Abstract Developing accurate groundwater vulnerability maps is essential for the sustainable management of groundwater resources. In this research, resampling methods [e.g., Bootstrap Aggregating (BA) and Disjoint Aggregating (DA)] are combined with machine learning (ML) models, namely eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), and Random Forest (RF), to improve the GALDIT groundwater vulnerability mapping framework that considers Groundwater occurrence (G) (i.e., aquifer type), Aquifer hydraulic conductivity (A), depth to groundwater Level (L), Distance from the seashore (D), Impact of existing seawater intrusion status (I), and aquifer Thickness (T). The proposed approach overcomes the subjectivity of the weights and ratings given to the six variables in the GALDIT framework (via the ML methods) and helps address the small dataset issue (via resampling methods) common to groundwater vulnerability predictive mapping. Considering the Shabestar Plain aquifer, situated in the northeast of Lake Urmia (Iran), the predicted vulnerability indices from GALDIT were adjusted using total dissolved solid (TDS, an indicator of drinking water quality) concentrations and modeled by the ML models. Pearson’s correlation coefficient (r) and distance correlation (DC) between the predicted vulnerability indices and TDS were used to validate the models. Using a validation set, the GALDIT framework (r = 0.447 and DC = 0.511) was compared against the best performing standalone (XGBoost-GALDIT, r = 0.613, DC = 0.647) and coupled resampling (BA-XGBoost-GALDIT, r = 0.659, DC = 0.699 and DA-RF-GALDIT, r = 0.616, DC = 0.662) ML models, revealing that the proposed framework significantly increases r and DC metrics. In general, the BA resampling method lead to better performing ML models than DA. However, in all cases, it is found that integrating resampling methods and ML models are promising tools for improving the accuracy of GALDIT vulnerability models.

[1]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[2]  Bruce G. Marcot,et al.  What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? , 2020, Computational Statistics.

[3]  Yanguo Teng,et al.  Assessment and validation of groundwater vulnerability to nitrate based on a modified DRASTIC model: a case study in Jilin City of northeast China. , 2012, The Science of the total environment.

[4]  K. Brindha,et al.  Cross comparison of five popular groundwater pollution vulnerability index approaches , 2015 .

[5]  Demetris Koutsoyiannis Revisiting the global hydrological cycle: is it intensifying? , 2020 .

[6]  J. Bian,et al.  Assessment and validation of groundwater vulnerability to nitrate in porous aquifers based on a DRASTIC method modified by projection pursuit dynamic clustering model. , 2019, Journal of contaminant hydrology.

[7]  A. Sharafati,et al.  Groundwater contamination vulnerability assessment using DRASTIC method, GSA, and uncertainty analysis , 2020, Arabian Journal of Geosciences.

[8]  Jan Adamowski,et al.  A stochastic wavelet-based data-driven framework for forecasting uncertain multiscale hydrological and water resources processes , 2020, Environ. Model. Softw..

[9]  F. Huneau,et al.  Combinations of geoenvironmental data underline coastal aquifer anthropogenic nitrate legacy through groundwater vulnerability mapping methods. , 2019, The Science of the total environment.

[10]  B. Pradhan,et al.  Meta-heuristic algorithms in optimizing GALDIT framework: A comparative study for coastal aquifer vulnerability assessment , 2020 .

[11]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[12]  S. Desai,et al.  Separation of pulsar signals from noise using supervised machine learning algorithms , 2017, Astron. Comput..

[13]  E. Tziritis,et al.  Heavy Metal(loid)s in the Groundwater of Shabestar Area (NW Iran): Source Identification and Health Risk Assessment , 2017, Exposure and Health.

[14]  H. Moradi,et al.  A Comprehensive evaluation of groundwater vulnerability to saltwater up-coning and sea water intrusion in a coastal aquifer (case study: Ghaemshahr-juybar aquifer) , 2018 .

[15]  S. Saha,et al.  Integration of artificial intelligence with meta classifiers for the gully erosion susceptibility assessment in Hinglo river basin, Eastern India , 2021, Advances in Space Research.

[16]  Mary Ann Piette,et al.  Building thermal load prediction through shallow machine learning and deep learning , 2020, Applied Energy.

[17]  Matthew R. Hallowell,et al.  AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes. , 2019 .

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Xungui Li,et al.  A new assessment method for the vulnerability of confined water: W-F&PNN method , 2020 .

[20]  A. Neshat,et al.  DRASTIC framework improvement using Stepwise Weight Assessment Ratio Analysis (SWARA) and combination of Genetic Algorithm and Entropy , 2020, Environmental Science and Pollution Research.

[21]  R. Barzegar,et al.  Optimizing the DRASTIC vulnerability approach to overcome the subjectivity: a case study from Shabestar plain, Iran , 2019, Arabian Journal of Geosciences.

[22]  M. Mohammadi,et al.  Simulation of groundwater level fluctuations in response to main climate parameters using a wavelet–ANN hybrid technique for the Shabestar Plain, Iran , 2019, Environmental Earth Sciences.

[23]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[24]  K. Khosravi,et al.  New hybrid-based approach for improving the accuracy of coastal aquifer vulnerability assessment maps. , 2021, The Science of the total environment.

[25]  B. Pradhan,et al.  Landslide Susceptibility Mapping Along the National Road 32 of Vietnam Using GIS-Based J48 Decision Tree Classifier and Its Ensembles , 2014 .

[26]  Binh Thai Pham,et al.  GIS Based Novel Hybrid Computational Intelligence Models for Mapping Landslide Susceptibility: A Case Study at Da Lat City, Vietnam , 2019 .

[27]  Barnali M. Dixon,et al.  A case study using support vector machines, neural networks and logistic regression in a GIS to identify wells contaminated with nitrate-N , 2009 .

[28]  Dieu Tien Bui,et al.  Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS , 2017 .

[29]  Tri Dev Acharya,et al.  Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China) , 2018 .

[30]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[31]  X. Sanchez‐Vila,et al.  An approach to aquifer vulnerability including uncertainty in a spatial random function framework , 2014 .

[32]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[33]  H. Abida,et al.  Saltwater intrusion modelling in Jorf coastal aquifer, South‐eastern Tunisia: geochemical, geoelectrical and geostatistical application , 2013 .

[34]  Jan Adamowski,et al.  Coupling a hybrid CNN-LSTM deep learning model with a Boundary Corrected Maximal Overlap Discrete Wavelet Transform for multiscale Lake water level forecasting , 2021, Journal of Hydrology.

[35]  A Mayr,et al.  The Evolution of Boosting Algorithms , 2014, Methods of Information in Medicine.

[36]  Dawei Han,et al.  Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction , 2011 .

[37]  Luís Ribeiro,et al.  Evaluation of an intrinsic and a specific vulnerability assessment method in comparison with groundwater salinisation and nitrate contamination levels in two agricultural regions in the south of Portugal , 2006 .

[38]  Xiaojun Ma,et al.  Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning , 2018, Electron. Commer. Res. Appl..

[39]  Zeynel Demirel,et al.  The history and evaluation of saltwater intrusion into a coastal aquifer in Mersin, Turkey. , 2004, Journal of environmental management.

[40]  Gordon Rittenhouse Bromine in Oil-Field Waters and Its Use in Determining Possibilities of Origin of These Waters , 1967 .

[41]  B. Pradhan,et al.  Estimating groundwater vulnerability to pollution using a modified DRASTIC model in the Kerman agricultural area, Iran , 2014, Environmental Earth Sciences.

[42]  Yousef Hassanzadeh,et al.  Vulnerability Indexing to Saltwater Intrusion from Models at Two Levels using Artificial Intelligence Multiple Model (AIMM). , 2020, Journal of environmental management.

[43]  Sotiris B. Kotsiantis,et al.  Combining Bagging, Boosting and Dagging for Classification Problems , 2007, KES.

[44]  Mario R. Eden,et al.  Formation lithology classification using scalable gradient boosted decision trees , 2019, Comput. Chem. Eng..

[45]  V. Rodriguez-Galiano,et al.  Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). , 2014, The Science of the total environment.

[46]  W. Zeng,et al.  Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions , 2019, Journal of Hydrology.

[47]  Nadhir Al-Ansari,et al.  Improvement of Credal Decision Trees Using Ensemble Frameworks for Groundwater Potential Modeling , 2020, Sustainability.

[48]  Chuanming Ma,et al.  Groundwater vulnerability assessment using the GALDIT model and the improved DRASTIC model: a case in Weibei Plain, China , 2018, Environmental Science and Pollution Research.

[49]  L. Aller,et al.  Drastic: A Standardized System to Evaluate Groundwater Pollution Potential using Hydrogeologic Setting , 1987 .

[50]  Seong-Hoon Hwang,et al.  Data-driven machine-learning-based seismic failure mode identification of reinforced concrete shear walls , 2020 .

[51]  Konstantinos Voudouris,et al.  Groundwater vulnerability and pollution risk assessment of porous aquifers to nitrate: Modifying the DRASTIC method using quantitative parameters , 2015 .

[52]  A. A. Moghaddam,et al.  Hydrogeological and geochemical evidence for the origin of brackish groundwater in the Shabestar plain aquifer, northwest Iran , 2017, Sustainable Water Resources Management.

[53]  Barnali M. Dixon,et al.  Optimization of DRASTIC method by supervised committee machine artificial intelligence to assess groundwater vulnerability for Maragheh–Bonab plain aquifer, Iran , 2013 .

[54]  J. Roca-Pardiñas,et al.  Determining optimum wavelengths for leaf water content estimation from reflectance: A distance correlation approach , 2018 .

[55]  N. Lambrakis,et al.  Optimization of the DRASTIC method for groundwater vulnerability assessment via the use of simple statistical methods and GIS , 2006 .

[56]  Jianghua Zheng,et al.  CatBoost: A new approach for estimating daily reference crop evapotranspiration in arid and semi-arid regions of Northern China , 2020 .

[57]  Rahman Khatibi,et al.  Groundwater vulnerability indices conditioned by Supervised Intelligence Committee Machine (SICM). , 2017, The Science of the total environment.

[58]  Biswajeet Pradhan,et al.  Groundwater vulnerability assessment using an improved DRASTIC method in GIS , 2014 .

[59]  A Hybrid Approach Based on Statistical Method and Meta-heuristic Optimization Algorithm for Coastal Aquifer Vulnerability Assessment , 2021, Environmental Modeling & Assessment.

[60]  Wei Chen,et al.  Evaluating the usage of tree-based ensemble methods in groundwater spring potential mapping , 2020 .

[61]  M. Bordbar,et al.  Modification of the GALDIT framework using statistical and entropy models to assess coastal aquifer vulnerability , 2019, Hydrological Sciences Journal.

[62]  Nurdan Akhan Baykan,et al.  A MINERAL CLASSIFICATION SYSTEM WITH MULTIPLE ARTIFICIAL NEURAL NETWORK USING K-FOLD CROSS VALIDATION , 2011 .

[63]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[64]  Xiaoyu WU,et al.  Assessment of groundwater vulnerability by applying the modified DRASTIC model in Beihai City, China , 2018, Environmental Science and Pollution Research.

[65]  Halil Ibrahim Erdal,et al.  Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms , 2013 .

[66]  Mike Spiliotis,et al.  A fuzzy multicriteria categorization of the GALDIT method to assess seawater intrusion vulnerability of coastal aquifers. , 2018, The Science of the total environment.

[67]  M. Bordbar,et al.  A new hybrid framework for optimization and modification of groundwater vulnerability in coastal aquifer , 2019, Environmental Science and Pollution Research.

[68]  A. Chachadi,et al.  Sea water intrusion vulnerability mapping of aquifers using the GALDIT method , 2001 .

[69]  Romulus Costache,et al.  Improvement of Best First Decision Trees Using Bagging and Dagging Ensembles for Flood Probability Mapping , 2020, Water Resources Management.

[70]  Mingxi Liu,et al.  A novel cryptocurrency price trend forecasting model based on LightGBM , 2020 .

[71]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[72]  Romulus Costache,et al.  Rainfall induced landslide susceptibility mapping using novel hybrid soft computing methods based on multi-layer perceptron neural network classifier , 2020, Geocarto International.

[73]  C. Güler,et al.  Assessment of groundwater vulnerability to nonpoint source pollution in a Mediterranean coastal zone (Mersin, Turkey) under conflicting land use practices , 2013 .

[74]  Asghar Asghari Moghaddam,et al.  A supervised committee machine artificial intelligent for improving DRASTIC method to assess groundwater contamination risk: a case study from Tabriz plain aquifer, Iran , 2016, Stochastic Environmental Research and Risk Assessment.

[75]  Kourosh Mohammadi,et al.  MODIFICATION OF DRASTIC MODEL TO MAP GROUNDWATER VULNERABILITY TO POLLUTION USING NITRATE MEASUREMENTS IN AGRICULTURAL AREAS , 2011 .

[76]  Rahim Barzegar,et al.  Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms. , 2018, The Science of the total environment.

[77]  N. Kazakis,et al.  GALDIT-SUSI a modified method to account for surface water bodies in the assessment of aquifer vulnerability to seawater intrusion. , 2019, Journal of environmental management.

[78]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..