Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods.

Water scarcity in many regions of the world has become an unpleasant reality. Groundwater appears to be one of the main natural resources capable to reverse this situation. Uncovering the spatial patterns of groundwater occurrence is a crucial factor that could assist in carrying out successful water resources management projects. The main objective of the current study was to provide a novel methodology approach which utilized Genetic Algorithm (GA) in order to perform a feature selection procedure and data mining methods for generating a groundwater spring potential map. Three data mining methods, Naïve Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF) were utilized to construct a groundwater spring potential map that had over 0.81 probability of occurrence for the Wuqi County, Shaanxi Province, China. Groundwater spring locations and sixteen related variables were analyzed, namely: lithology, soil cover, land use cover, normalized difference vegetation index (NDVI), elevation, slope angle, aspect, planform curvature, profile curvature, curvature, stream power index (SPI), stream transport index (STI), topographic wetness index (TWI), mean annual rainfall, distance from river network and distance from road network. The Frequency ratio method was used to weight the variables, whereas a multi-collinearity analysis was performed to identify the relation between the parameters and to decide about their usage. The optimal set of parameters, which was determined by the GA, reduced the number of parameters into twelve removing planform curvature, profile curvature, curvature and STI. The Receiver Operating Characteristic curve and the area under the curve (AUROC) were estimated so as to evaluate the predictive power of each model. The results indicated that the optimized models were superior in accuracy than the original models. The optimized RF model produced the best results (0.9572), followed by the optimized SVM (0.9529) and the optimized NB (0.8235). Overall, the current study highlights the necessity of applying feature selection techniques in groundwater spring assessments and also that data mining methods may be a highly powerful investigation approach for groundwater spring potential mapping.

[1]  A. Zhu,et al.  Applying genetic algorithms to set the optimal combination of forest fire related variables and model forest fire susceptibility based on data mining models. The case of Dayu County, China. , 2018, The Science of the total environment.

[2]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[3]  Paul V. Bolstad,et al.  Predicting Southern Appalachian overstory vegetation with digital terrain data , 2004, Landscape Ecology.

[4]  Seyed Amir Naghibi,et al.  A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping , 2017 .

[5]  G. Jenks The Data Model Concept in Statistical Mapping , 1967 .

[6]  Saro Lee,et al.  Regional groundwater productivity potential mapping using a geographic information system (GIS) based artificial neural network model , 2012, Hydrogeology Journal.

[7]  B. Pradhan,et al.  Groundwater spring potential mapping using bivariate statistical model and GIS in the Taleghan Watershed, Iran , 2015, Arabian Journal of Geosciences.

[8]  Himan Shahabi,et al.  Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability , 2019, Agricultural and Forest Meteorology.

[9]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Seyed Amir Naghibi,et al.  A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping , 2015, Water Resources Management.

[11]  Xiaojing Wang,et al.  Landslide Susceptibility Modeling Based on GIS and Novel Bagging-Based Kernel Logistic Regression , 2018, Applied Sciences.

[12]  Wei Chen,et al.  GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. , 2018, The Science of the total environment.

[13]  Bahareh Kalantar,et al.  Groundwater potential mapping using a novel data-mining ensemble model , 2018, Hydrogeology Journal.

[14]  Paraskevas Tsangaratos,et al.  Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size , 2016 .

[15]  I. Ilia,et al.  Land subsidence phenomena investigated by spatiotemporal analysis of groundwater resources, remote sensing techniques, and random forest method: the case of Western Thessaly, Greece , 2018, Environmental Monitoring and Assessment.

[16]  I. Ilia,et al.  Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece , 2016, Landslides.

[17]  Dieu Tien Bui,et al.  Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility , 2019, CATENA.

[18]  Jonathan M. Garibaldi,et al.  A 'non-parametric' version of the naive Bayes classifier , 2011, Knowl. Based Syst..

[19]  Hyung-Sup Jung,et al.  GIS-based groundwater potential mapping using artificial neural network and support vector machine models: the case of Boryeong city in Korea , 2018 .

[20]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[21]  S. Kaliraj,et al.  Identification of potential groundwater recharge zones in Vaigai upper basin, Tamil Nadu, using GIS-based analytical hierarchical process (AHP) technique , 2014, Arabian Journal of Geosciences.

[22]  Biswajeet Pradhan,et al.  Landslide susceptibility assessment using a novel hybrid model of statistical bivariate methods (FR and WOE) and adaptive neuro-fuzzy inference system (ANFIS) at southern Zagros Mountains in Iran , 2017, Environmental Earth Sciences.

[23]  Seyed Amir Naghibi,et al.  GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran , 2015, Environmental Monitoring and Assessment.

[24]  Wei Chen,et al.  Landslide susceptibility assessment at the Wuning area, China: a comparison between multi-criteria decision making, bivariate statistical and machine learning methods , 2018, Natural Hazards.

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Zohre Sadat Pourtaghi,et al.  Investigation of general indicators influencing on forest fire and its susceptibility modeling using different data mining techniques , 2016 .

[27]  Wei Chen,et al.  Spatial prediction of groundwater potentiality using ANFIS ensembled with teaching-learning-based and biogeography-based optimization , 2019, Journal of Hydrology.

[28]  Wei Chen,et al.  A comparative study on groundwater spring potential analysis based on statistical index, index of entropy and certainty factors models , 2018 .

[29]  R. Forman,et al.  ROADS AND THEIR MAJOR ECOLOGICAL EFFECTS , 1998 .

[30]  R. Yadav,et al.  Effects of different land uses on infiltration in ustifluvent soil susceptible to gully erosion , 1995 .

[31]  Paraskevas Tsangaratos,et al.  Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. , 2019, The Science of the total environment.

[32]  Zenghui Sun,et al.  Landslide Susceptibility Modeling Using Integrated Ensemble Weights of Evidence with Logistic Regression and Random Forest Models , 2019, Applied Sciences.

[33]  Saro Lee,et al.  GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea , 2011 .

[34]  Binh Thai Pham,et al.  Prediction of shear strength of soft soil using machine learning methods , 2018, CATENA.

[35]  Hamid Reza Pourghasemi,et al.  Erratum to: Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2016, Landslides.

[36]  D. Bui,et al.  Spatial prediction of groundwater spring potential mapping based on an adaptive neuro-fuzzy inference system and metaheuristic optimization , 2018, Hydrology and Earth System Sciences.

[37]  Luca Scrucca,et al.  GA: A Package for Genetic Algorithms in R , 2013 .

[38]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[39]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[40]  Seyed Amir Naghibi,et al.  GIS-based landslide spatial modeling in Ganzhou City, China , 2016, Arabian Journal of Geosciences.

[41]  E. Yesilnacar,et al.  Landslide susceptibility mapping : A comparison of logistic regression and neural networks methods in a medium scale study, Hendek Region (Turkey) , 2005 .

[42]  B C Sarkar,et al.  A Geographic Information System approach to evaluation of groundwater potentiality of Shamri micro-watershed in the Shimla Taluk, Himachal Pradesh , 2001 .

[43]  Arvind Pandey,et al.  Delineation of groundwater potential zone in hard rock terrain of India using remote sensing, geographical information system (GIS) and analytic hierarchy process (AHP) techniques , 2015 .

[44]  Ercan Kahya,et al.  Mapping of groundwater potential zones in the Musi basin using remote sensing data and GIS , 2009, Adv. Eng. Softw..

[45]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[46]  S. Menard Applied Logistic Regression Analysis , 1996 .

[47]  Wei Chen,et al.  Spatial prediction of landslide susceptibility by combining evidential belief function, logistic regression and logistic model tree , 2019, Geocarto International.

[48]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[49]  D. Bui,et al.  Landslide susceptibility modeling using Reduced Error Pruning Trees and different ensemble techniques: Hybrid machine learning approaches , 2019, CATENA.

[50]  Biswajeet Pradhan,et al.  Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree , 2016, Landslides.

[51]  J. Zêzere,et al.  Assessment and validation of wildfire susceptibility and hazard in Portugal , 2009 .

[52]  Asmala Ahmad,et al.  Analysis of Maximum Likelihood Classificationon Multispectral Data , 2012 .

[53]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[54]  Zohre Sadat Pourtaghi,et al.  Forest fire susceptibility mapping in the Minudasht forests, Golestan province, Iran , 2015, Environmental Earth Sciences.

[55]  Soyoung Park,et al.  Landslide Susceptibility Mapping Based on Random Forest and Boosted Regression Tree Models, and a Comparison of Their Performance , 2019, Applied Sciences.

[56]  Dieu Tien Bui,et al.  A novel hybrid approach of landslide susceptibility modelling using rotation forest ensemble and different base classifiers , 2019, Geocarto International.

[57]  H. Pourghasemi,et al.  Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran , 2016 .

[58]  D. Bui,et al.  A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: A case study in the Himalayan area, India , 2017, International Journal of Sediment Research.

[59]  H. S. Lim,et al.  Regional prediction of groundwater potential mapping in a multifaceted geology terrain using GIS-based Dempster–Shafer model , 2015, Arabian Journal of Geosciences.

[60]  P. Hessburg,et al.  Predicting late-successional fire refugia pre-dating European settlement in the Wenatchee Mountains , 1997 .

[61]  Biswajeet Pradhan,et al.  Novel Hybrid Integration Approach of Bagging-Based Fisher’s Linear Discriminant Function for Groundwater Potential Analysis , 2019, Natural Resources Research.

[62]  Thomas W. Giambelluca,et al.  Use of the distributed hydrology soil vegetation model to study road effects on hydrological processes in Pang Khum Experimental Watershed, northern Thailand , 2006 .

[63]  Karim Solaimani,et al.  Landslide susceptibility mapping based on frequency ratio and logistic regression models , 2013, Arabian Journal of Geosciences.

[64]  B. Pradhan,et al.  Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models , 2012 .

[65]  Hamid Reza Pourghasemi,et al.  Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran , 2015, Earth Science Informatics.

[66]  Zohre Sadat Pourtaghi,et al.  Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2015, Landslides.

[67]  Zohre Sadat Pourtaghi,et al.  GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran , 2014, Hydrogeology Journal.

[68]  H. Pourghasemi,et al.  Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. , 2018, The Science of the total environment.

[70]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[71]  V. Singh,et al.  New Hybrids of ANFIS with Several Optimization Algorithms for Flood Susceptibility Modeling , 2018, Water.

[72]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[73]  A. Corsini,et al.  Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy) , 2009 .

[74]  V. Singh,et al.  Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods , 2018, Scientific Reports.

[75]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[76]  J. Hay,et al.  High-resolution studies of rainfall on Norfolk Island: Part II: Interpolation of rainfall data , 1998 .

[77]  H. Pourghasemi,et al.  Groundwater potential mapping at Kurdistan region of Iran using analytic hierarchy process and GIS , 2015, Arabian Journal of Geosciences.

[78]  Seyed Amir Naghibi,et al.  Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping , 2017, Water Resources Management.

[79]  Nhat-Duc Hoang,et al.  A Novel Integrated Approach of Relevance Vector Machine Optimized by Imperialist Competitive Algorithm for Spatial Modeling of Shallow Landslides , 2018, Remote. Sens..

[80]  D. Bui,et al.  Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees , 2018 .

[81]  Cristiano Ballabio,et al.  Support Vector Machines for Landslide Susceptibility Mapping: The Staffora River Basin Case Study, Italy , 2012, Mathematical Geosciences.

[82]  Wei Chen,et al.  A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China , 2017 .

[83]  Veerle Vanacker,et al.  Logistic regression applied to natural hazards: rare event logistic regression with replications , 2012 .

[84]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[85]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[86]  A-Xing Zhu,et al.  Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. , 2018, The Science of the total environment.

[87]  A. Ozdemir Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey) , 2011 .

[88]  Dieu Tien Bui,et al.  A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling , 2018, Geocarto International.

[89]  B. Pradhan,et al.  A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility , 2017 .

[90]  C. Colvin,et al.  A review of information on interactions between vegetation and groundwater , 1999 .

[91]  I. Moore,et al.  Digital terrain modelling: A review of hydrological, geomorphological, and biological applications , 1991 .

[92]  Alireza Arabameri,et al.  GIS-based groundwater potential mapping in Shahroud plain, Iran. A comparison among statistical (bivariate and multivariate), data mining and MCDM approaches. , 2019, The Science of the total environment.