Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS

Ever increasing demand for water resources for different purposes makes it essential to have better understanding and knowledge about water resources. As known, groundwater resources are one of the main water resources especially in countries with arid climatic condition. Thus, this study seeks to provide groundwater potential maps (GPMs) employing new algorithms. Accordingly, this study aims to validate the performance of C5.0, random forest (RF), and multivariate adaptive regression splines (MARS) algorithms for generating GPMs in the eastern part of Mashhad Plain, Iran. For this purpose, a dataset was produced consisting of spring locations as indicator and groundwater-conditioning factors (GCFs) as input. In this research, 13 GCFs were selected including altitude, slope aspect, slope angle, plan curvature, profile curvature, topographic wetness index (TWI), slope length, distance from rivers and faults, rivers and faults density, land use, and lithology. The mentioned dataset was divided into two classes of training and validation with 70 and 30% of the springs, respectively. Then, C5.0, RF, and MARS algorithms were employed using R statistical software, and the final values were transformed into GPMs. Finally, two evaluation criteria including Kappa and area under receiver operating characteristics curve (AUC-ROC) were calculated. According to the findings of this research, MARS had the best performance with AUC-ROC of 84.2%, followed by RF and C5.0 algorithms with AUC-ROC values of 79.7 and 77.3%, respectively. The results indicated that AUC-ROC values for the employed models are more than 70% which shows their acceptable performance. As a conclusion, the produced methodology could be used in other geographical areas. GPMs could be used by water resource managers and related organizations to accelerate and facilitate water resource exploitation.

[1]  Hamid Reza Pourghasemi,et al.  A comparative assessment between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China , 2017, Arabian Journal of Geosciences.

[2]  Biswajeet Pradhan,et al.  Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS , 2014, Arabian Journal of Geosciences.

[3]  U. Gessner,et al.  Regional land cover mapping and change detection in Central Asia using MODIS time-series , 2012 .

[4]  S. Pascale,et al.  Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo River catchment (northern Calabria, Italy) , 2014 .

[5]  Li Tin,et al.  Landslide Susceptibility Mapping Using Random Forest , 2014 .

[6]  I. Moore,et al.  Sediment Transport Capacity of Sheet and Rill Flow: Application of Unit Stream Power Theory , 1986 .

[7]  R. DeFries,et al.  Classification trees: an alternative to traditional land cover classifiers , 1996 .

[8]  B. Pradhan,et al.  Application of GIS based data driven evidential belief function model to predict groundwater potential zonation , 2014 .

[9]  Hamid Reza Pourghasemi,et al.  Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS , 2015, Earth Science Informatics.

[10]  Prashant K. Srivastava,et al.  Integrating GIS and remote sensing for identification of groundwater potential zones in the hilly terrain of Pavagarh, Gujarat, India , 2010 .

[11]  Harun Artuner,et al.  Application of Decision Tree Algorithm for classification and identification of natural minerals using SEM-EDS , 2015, Comput. Geosci..

[12]  Seyed Amir Naghibi,et al.  A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping , 2015, Water Resources Management.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  A. Ozdemir Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey) , 2011 .

[15]  Paulo Cortez,et al.  Using sensitivity analysis and visualization techniques to open black box data mining models , 2013, Inf. Sci..

[16]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[17]  Seyed Amir Naghibi,et al.  A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping , 2017 .

[18]  Seyed Amir Naghibi,et al.  GIS-based landslide spatial modeling in Ganzhou City, China , 2016, Arabian Journal of Geosciences.

[19]  A. Corsini,et al.  Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy) , 2009 .

[20]  D. Greenbaum,et al.  Structural influences on the occurrence of groundwater in SE Zimbabwe , 1992, Geological Society, London, Special Publications.

[21]  Iman Nasiri Aghdam,et al.  A new hybrid model using Step-wise Weight Assessment Ratio Analysis (SWARA) technique and Adaptive Neuro-fuzzy Inference System (ANFIS) for regional landslide hazard assessment in Iran , 2015 .

[22]  Seyed Amir Naghibi,et al.  Evaluation of four supervised learning methods for groundwater spring potential mapping in Khalkhal region (Iran) using GIS-based features , 2017, Hydrogeology Journal.

[23]  Michael Negnevitsky,et al.  Artificial Intelligence: A Guide to Intelligent Systems , 2001 .

[24]  Saro Lee,et al.  GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea , 2011 .

[25]  Zhi Yang,et al.  Fine-grained and coarse-grained Paleogene sublacustrine fan systems in Fushan Depression, Beibuwan Basin, South China Sea: implications for sedimentary characteristics and depositional processes , 2016, Arabian Journal of Geosciences.

[26]  John P. Wilson,et al.  Terrain analysis : principles and applications , 2000 .

[27]  E. Rotigliano,et al.  Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: A case of the Belice River basin (western Sicily, Italy) , 2015 .

[28]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[29]  Clement Atzberger,et al.  Tree Species Classification with Random Forest Using Very High Spatial Resolution 8-Band WorldView-2 Satellite Data , 2012, Remote. Sens..

[30]  Lionel C. Briand,et al.  Using multiple adaptive regression splines to support decision making in code inspections , 2004, J. Syst. Softw..

[31]  Biswajeet Pradhan,et al.  Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area , 2011, Comput. Geosci..

[32]  Ismail Chenini,et al.  Groundwater recharge study in arid region: An approach using GIS techniques and numerical modeling , 2010, Computational Geosciences.

[33]  S. Schnabel,et al.  Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies , 2009 .

[34]  Shattri Mansor,et al.  Disasters and Risk Reduction in Groundwater: Zagros Mountain Southwest Iran Using Geoinformatics Techniques , 2010 .

[35]  Seyed Amir Naghibi,et al.  Prioritization of landslide conditioning factors and its spatial modeling in Shangnan County, China using GIS-based data mining algorithms , 2018, Bulletin of Engineering Geology and the Environment.

[36]  John Webb,et al.  Remote sensing and GIS for mapping groundwater recharge and discharge areas in salinity prone catchments, southeastern Australia , 2007 .

[37]  Omid Rahmati,et al.  Spatial analysis of groundwater potential using weights-of-evidence and evidential belief function models and remote sensing , 2015, Arabian Journal of Geosciences.

[38]  Hamid Reza Pourghasemi,et al.  Assessment of a spatial multi-criteria evaluation to site selection underground dams in the Alborz Province, Iran , 2016 .

[39]  Peter Fox,et al.  Semantic e-Science , 2015, Earth Science Informatics.

[40]  Iman Nasiri Aghdam,et al.  Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran) , 2016, Environmental Earth Sciences.

[41]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[42]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[43]  N. Mondal,et al.  Deciphering potential groundwater zone in hard rock through the application of GIS , 2008 .

[44]  J. Pereira,et al.  Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest , 2012 .

[45]  Hamid Reza Pourghasemi,et al.  Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran , 2015, Earth Science Informatics.

[46]  Imas Sukaesih Sitanggang,et al.  Web-based Classification Application for Forest Fire Data Using the Shiny Framework and the C5.0 Algorithm , 2016 .

[47]  H. Pourghasemi,et al.  Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran , 2016 .

[48]  Zohre Sadat Pourtaghi,et al.  GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran , 2016, Environmental Earth Sciences.

[49]  Zohre Sadat Pourtaghi,et al.  GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran , 2014, Hydrogeology Journal.

[50]  Seyed Amir Naghibi,et al.  GIS-based Groundwater Spring Potential Mapping Using Data Mining Boosted Regression Tree and Probabilistic Frequency Ratio Models in Iran , 2017 .

[51]  Chun-Chieh Yang,et al.  APPLICATION OF MULTIVARIATE ADAPTIVE REGRESSION SPLINES (MARS) TO SIMULATE SOIL TEMPERATURE , 2004 .

[52]  Omid Rahmati,et al.  Application of Dempster-Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan, Iran. , 2016, The Science of the total environment.

[53]  C. Cosenza,et al.  Groundwater vulnerability and risk mapping using GIS, modeling and a fuzzy logic tool. , 2007, Journal of contaminant hydrology.

[54]  H. S. Lim,et al.  Regional prediction of groundwater potential mapping in a multifaceted geology terrain using GIS-based Dempster–Shafer model , 2015, Arabian Journal of Geosciences.

[55]  Hamid Reza Pourghasemi,et al.  Spatial modelling of gully erosion in Mazandaran Province, northern Iran , 2018 .

[56]  T. Hastie,et al.  Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions , 2006 .

[57]  Pijush Samui,et al.  Multivariate adaptive regression spline (MARS) and least squares support vector machine (LSSVM) for OCR prediction , 2012, Soft Comput..

[58]  Saro Lee,et al.  Application of a weights-of-evidence method and GIS to regional groundwater productivity potential mapping. , 2012, Journal of environmental management.

[59]  Dipankar Saha,et al.  Delineation of groundwater development potential zones in parts of marginal Ganga Alluvial Plain in South Bihar, Eastern India , 2010, Environmental monitoring and assessment.

[60]  Alexis J. Comber,et al.  Random forest classification of salt marsh vegetation habitats using quad-polarimetric airborne SAR, elevation and optical RS data , 2014 .

[61]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[62]  Hamid Reza Pourghasemi,et al.  Erratum to: Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia , 2016, Landslides.

[63]  Hamid Reza Pourghasemi,et al.  A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS , 2018, Theoretical and Applied Climatology.

[64]  F. Moreira,et al.  Modeling and mapping wildfire ignition risk in Portugal , 2009 .

[65]  Douglas G. Woolford,et al.  A model for predicting human-caused wildfire occurrence in the region of Madrid, Spain , 2010 .

[66]  I. Moore,et al.  Digital terrain modelling: A review of hydrological, geomorphological, and biological applications , 1991 .

[67]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[68]  Seyed Amir Naghibi,et al.  A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China , 2018, Bulletin of Engineering Geology and the Environment.

[69]  A. Ozdemir GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison , 2011 .

[70]  Seyed Amir Naghibi,et al.  GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran , 2015, Environmental Monitoring and Assessment.

[71]  V. M. Chowdary,et al.  Delineation of groundwater recharge zones and identification of artificial recharge sites in West Medinipur district, West Bengal, using RS, GIS and MCDM techniques , 2009 .

[72]  Seyed Amir Naghibi,et al.  Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping , 2017, Water Resources Management.

[73]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[74]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[75]  V. Moosavi,et al.  Development of hybrid wavelet packet-statistical models (WP-SM) for landslide susceptibility mapping , 2016, Landslides.