Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique

Abstract Machine learning and data-driven models have achieved a favorable reputation in the field of advanced geospatial modeling, particularly for models of groundwater aquifer potential over large areas. Such models built using standalone machine learning techniques retain some uncertainty, including errors associated with the modeling process, sampling approach, and input hyper-parameters. Some of these techniques cannot be applied in data-scarce regions because high bias and variance can lead to oversimplification. Therefore, in the current study, we developed and validated a novel ensemble multi-adaptive boosting logistic regression (MABLR) model for groundwater aquifer potential mapping. This model was validated in a large area of the Gyeongsangbuk-do basin in South Korea and the results were compared to those of different types of machine learning models including multiple-layer perception (MPL), logistic regression (LR), and support vector machine (SVM) models. A forward stepwise LR technique was implemented to assess the importance of contributing morphological factors; we found 15 factors that contributed significantly: topographic wetness index (TWI), topographic roughness index (TRI), stream power index (SPI), topographic position index (TPI), multi-resolution valley bottom flatness (MVBF), slope, aspect, slope length (LS), distance from the river, distance from the fault, profile curvature, plane curvature, altitude, land use/land cover (LULC), and geology. We optimized the MABLR model using a fuzzy logic supervised (FLS) approach with 184 iterations and then validated the results using accuracy assessment metrics including the κ coefficient, root-mean-square error (RMSE), receiver operating characteristics (ROC), and the precision-recall curve (PRC). Our model had superior predictive performance among the models tested, with higher overall goodness-of-fit and validation values according to the κ coefficient (0.819 and 0.781, respectively), ROC (0.917 and 0.838), and PRC (0.931 and 0.872). Our experimental results demonstrate that MABLR is more effective at reducing bias and variance error than other constituent machine learning methods.

[1]  Hamid Reza Pourghasemi,et al.  Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS , 2015, Earth Science Informatics.

[2]  M. Ridd,et al.  A Comparison of Four Algorithms for Change Detection in an Urban Environment , 1998 .

[3]  Saro Lee,et al.  GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea , 2011 .

[4]  B. Pradhan,et al.  A knowledge-driven GIS modeling technique for groundwater potential mapping at the Upper Langat Basin, Malaysia , 2013, Arabian Journal of Geosciences.

[5]  Alan T. Murray,et al.  Spatial Optimization in Geography , 2012 .

[6]  A. Corsini,et al.  Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy) , 2009 .

[7]  A. Al-Abadi,et al.  A comparison between index of entropy and catastrophe theory methods for mapping groundwater potential in an arid region , 2015, Environmental Monitoring and Assessment.

[8]  A. Ozdemir GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison , 2011 .

[9]  A. Helaly Assessment of groundwater potentiality using geophysical techniques in Wadi Allaqi basin, Eastern Desert, Egypt – Case study , 2017 .

[10]  Seyed Amir Naghibi,et al.  A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping , 2017 .

[11]  Biswajeet Pradhan,et al.  Urban object extraction using Dempster Shafer feature-based image analysis from worldview-3 satellite imagery , 2018, International Journal of Remote Sensing.

[12]  Biswajeet Pradhan,et al.  A Spatial Ensemble Model for Rockfall Source Identification From High Resolution LiDAR Data and GIS , 2019, IEEE Access.

[13]  B. Pradhan Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia , 2010 .

[14]  Wan Ramli Wan Daud,et al.  Optimization of the Spray Drying Operating Parameters—A Quick Trial-and-Error Method , 2007 .

[15]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[16]  Peter A. Vanrolleghem,et al.  Uncertainty in the environmental modelling process - A framework and guidance , 2007, Environ. Model. Softw..

[17]  A. Kaufman,et al.  Introduction to the Theory of Fuzzy Subsets. , 1977 .

[18]  Saro Lee,et al.  An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping , 2019, Remote. Sens..

[19]  M. Marjanović,et al.  Landslide susceptibility assessment using SVM machine learning algorithm , 2011 .

[20]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[21]  B. Pradhan,et al.  Landslide susceptibility mapping at Al-Hasher area, Jizan (Saudi Arabia) using GIS-based frequency ratio and index of entropy models , 2015, Geosciences Journal.

[22]  S. Anbazhagan,et al.  Modeling groundwater probability index in Ponnaiyar River basin of South India using analytic hierarchy process , 2016, Modeling Earth Systems and Environment.

[23]  H. Pourghasemi,et al.  Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran , 2016 .

[24]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[25]  J. Gallant,et al.  A multiresolution index of valley bottom flatness for mapping depositional areas , 2003 .

[26]  Omid Rahmati,et al.  Delineation of groundwater potential zones using remote sensing and GIS-based data-driven models , 2016 .

[27]  Iman Nasiri Aghdam,et al.  A new hybrid model using Step-wise Weight Assessment Ratio Analysis (SWARA) technique and Adaptive Neuro-fuzzy Inference System (ANFIS) for regional landslide hazard assessment in Iran , 2015 .

[28]  S. S. Dlay,et al.  Performance of keystroke biometrics authentication system using artificial neural network (ANN) and distance classifier method , 2010, International Conference on Computer and Communication Engineering (ICCCE'10).

[29]  A. Zhu,et al.  A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the Wuning area (China) , 2017, Environmental Earth Sciences.

[30]  H. S. Lim,et al.  Regional prediction of groundwater potential mapping in a multifaceted geology terrain using GIS-based Dempster–Shafer model , 2015, Arabian Journal of Geosciences.

[31]  K. A. N. Adiat,et al.  Assessing the accuracy of GIS-based elementary multi criteria decision analysis as a spatial prediction tool – A case of predicting potential zones of sustainable groundwater resources , 2012 .

[32]  Saro Lee,et al.  Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models , 2006 .

[33]  Biswajeet Pradhan,et al.  Assessment of groundwater nitrate contamination hazard in a semi-arid region by using integrated parametric IPNOA and data-driven logistic regression models , 2018, Environmental Monitoring and Assessment.

[34]  Mozammel Mia,et al.  Prediction of surface roughness in hard turning under high pressure coolant using Artificial Neural Network , 2016 .

[35]  Philippe De Maeyer,et al.  Application of the topographic position index to heterogeneous landscapes , 2013 .

[36]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[37]  Mustafa Neamah Jebur,et al.  Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS , 2013 .

[38]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[39]  Candan Gokceoglu,et al.  The 17 March 2005 Kuzulu landslide (Sivas, Turkey) and landslide-susceptibility map of its near vicinity , 2005 .

[40]  S. Weiss,et al.  GLM versus CCA spatial modeling of plant species distribution , 1999, Plant Ecology.

[41]  Abdul Halim Ghazali,et al.  Ensemble machine-learning-based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS , 2017 .

[42]  Lee Saro,et al.  A GIS-based logistic regression model in rock-fall susceptibility mapping along a mountainous road: Salavat Abad case study, Kurdistan, Iran , 2012, Natural Hazards.

[43]  Balamurugan Guru,et al.  Frequency ratio model for groundwater potential mapping and its sustainable management in cold desert, India , 2017 .

[44]  Seyed Amir Naghibi,et al.  GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran , 2015, Environmental Monitoring and Assessment.

[45]  L. Tham,et al.  Landslide susceptibility mapping based on Support Vector Machine: A case study on natural slopes of Hong Kong, China , 2008 .

[46]  Biswajeet Pradhan,et al.  Self-Learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas , 2018, Natural Resources Research.

[47]  S. Kaliraj,et al.  Identification of potential groundwater recharge zones in Vaigai upper basin, Tamil Nadu, using GIS-based analytical hierarchical process (AHP) technique , 2014, Arabian Journal of Geosciences.

[48]  B. Pradhan,et al.  Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: a comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms , 2013, Arabian Journal of Geosciences.

[49]  Biswajeet Pradhan,et al.  Soil erosion prediction based on land cover dynamics at the Semenyih watershed in Malaysia using LTM and USLE models , 2016 .

[50]  B. Pradhan,et al.  Landslide susceptibility mapping at Golestan Province, Iran: A comparison between frequency ratio, Dempster-Shafer, and weights-of-evidence models , 2012 .

[51]  Omid Rahmati,et al.  Applicability of generalized additive model in groundwater potential modelling and comparison its performance by bivariate statistical methods , 2017 .

[52]  Omid Rahmati,et al.  Application of Dempster-Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan, Iran. , 2016, The Science of the total environment.

[53]  Ismail Chenini,et al.  Groundwater Recharge Zone Mapping Using GIS-Based Multi-criteria Analysis: A Case Study in Central Tunisia (Maknassy Basin) , 2010 .

[54]  Zohre Sadat Pourtaghi,et al.  GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran , 2016, Environmental Earth Sciences.

[55]  Hossein Mojaddadi Rizeei,et al.  Surface Runoff Estimation and Prediction Regarding LULC and Climate Dynamics Using Coupled LTM, Optimized ARIMA and Distributed-GIS-Based SCS-CN Models at Tropical Region , 2017 .

[56]  B. Pradhan,et al.  Regional landslide susceptibility analysis using back-propagation neural network model at Cameron Highland, Malaysia , 2010 .

[57]  Biswajeet Pradhan,et al.  An integrated fluvial and flash pluvial model using 2D high-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS , 2018, Complex & Intelligent Systems.

[58]  W. Botzen,et al.  Individual preferences for reducing flood risk to near zero through elevation , 2013, Mitigation and Adaptation Strategies for Global Change.

[59]  Biswajeet Pradhan,et al.  Groundwater spring potential modelling: Comprising the capability and robustness of three different modeling approaches , 2018, Journal of Hydrology.

[60]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[61]  Bahareh Kalantar,et al.  Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential , 2019, Environmental Monitoring and Assessment.

[62]  B. Pradhan,et al.  Application of GIS based data driven evidential belief function model to predict groundwater potential zonation , 2014 .

[63]  Iman Nasiri Aghdam,et al.  Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran) , 2016, Environmental Earth Sciences.

[64]  B. Pradhan,et al.  GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks , 2016, Environmental Earth Sciences.

[65]  Jiuchuan Wei,et al.  A GIS-based model of potential groundwater yield zonation for a sandstone aquifer in the Juye Coalfield, Shangdong, China , 2018 .

[66]  Prashant Kumar,et al.  Index-based groundwater vulnerability mapping models using hydrogeological settings: A critical evaluation , 2015 .

[67]  Seyed Amir Naghibi,et al.  Groundwater Augmentation through the Site Selection of Floodwater Spreading Using a Data Mining Approach (Case study: Mashhad Plain, Iran) , 2018, Water.

[68]  Mustafa Neamah Jebur,et al.  Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS , 2014 .

[69]  Hossein Mojaddadi Rizeei,et al.  Surface runoff prediction regarding LULC and climate dynamics using coupled LTM, optimized ARIMA, and GIS-based SCS-CN models in tropical region , 2018, Arabian Journal of Geosciences.