Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential

Groundwater resources are facing a high pressure due to drought and overexploitation. The main aim of this research is to apply rotation forest (RTF) with decision trees as base classifiers and an improved ensemble methodology based on evidential belief function and tree-based models (EBFTM) for preparing groundwater potential maps (GPM). The performance of these new models is then compared with three previously implemented models, i.e., boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). For this purpose, spring locations in the Meshgin Shahr in Iran were detected. The spring locations were randomly categorized into training (70% of the locations) and validation (30% of the locations) datasets. Furthermore, several groundwater conditioning factors (GCFs) such as hydrogeological, topographical, and land use factors were mapped and regarded as input variables. The tree-based algorithms (i.e., BRT, CART, RF, and RTF) were applied by implementing the input variables and training dataset. The groundwater potential values (i.e., spring occurrence probability) obtained by the BRT, CART, RF, and RTF models for all the pixels of the study area were classified into four potential classes and then used as inputs of the EBF model to construct the new ensemble model (i.e., EBFTM). At last, this paper implemented a receiver operating characteristics (ROC) curve for determining the efficiency of the EBFTM, RTF, BRT, CART, and RF methods. The findings illustrated that the EBFTM had the highest efficacy with an area under the ROC curve (AUC) of 90.4%, followed by the RF, BRT, CART, and RTF models with AUC-ROC values of 90.1, 89.8, 86.9, and 86.2%, respectively. Thus, it could be inferred that the ensemble approach is capable of improving the efficacy of the single tree-based models in GPM production.

[1]  Hamid Reza Pourghasemi,et al.  A comparative assessment between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China , 2017, Arabian Journal of Geosciences.

[2]  Biswajeet Pradhan,et al.  Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS , 2014, Arabian Journal of Geosciences.

[3]  Wei Chen,et al.  A novel ensemble approach of bivariate statistical-based logistic model tree classifier for landslide susceptibility assessment , 2018 .

[4]  Qian Sheng,et al.  Effects of a controlling geological discontinuity on the seismic stability of an underground cavern subjected to near-fault ground motions , 2018, Bulletin of Engineering Geology and the Environment.

[5]  Seyed Amir Naghibi,et al.  A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping , 2017 .

[6]  D. Bui,et al.  Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees , 2018 .

[7]  M. H. Abokharima,et al.  Land subsidence susceptibility mapping at Kinta Valley (Malaysia) using the evidential belief function model in GIS , 2014, Natural Hazards.

[8]  Biswajeet Pradhan,et al.  Groundwater spring potential modelling: Comprising the capability and robustness of three different modeling approaches , 2018, Journal of Hydrology.

[9]  Larry M. Bartels Specification Uncertainty and Model Averaging , 1997 .

[10]  Hamid Reza Pourghasemi,et al.  Assessment and comparison of combined bivariate and AHP models with logistic regression for landslide susceptibility mapping in the Chaharmahal-e-Bakhtiari Province, Iran , 2016, Arabian Journal of Geosciences.

[11]  K. A. N. Adiat,et al.  Assessing the accuracy of GIS-based elementary multi criteria decision analysis as a spatial prediction tool – A case of predicting potential zones of sustainable groundwater resources , 2012 .

[12]  A. Zhu,et al.  Exploring the effects of the design and quantity of absence data on the performance of random forest-based landslide susceptibility mapping , 2019, CATENA.

[13]  O. Kisi,et al.  A New Approach for Modeling Sediment-Discharge Relationship: Local Weighted Linear Regression , 2016, Water Resources Management.

[14]  Zohre Sadat Pourtaghi,et al.  Landslide susceptibility assessment in Lianhua County (China); a comparison between a random forest data mining technique and bivariate and multivariate statistical models , 2016 .

[15]  Seyed Amir Naghibi,et al.  A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping , 2015, Water Resources Management.

[16]  Frank van Ruitenbeek,et al.  Knowledge-guided data-driven evidential belief modeling of mineral prospectivity in Cabo de Gata, SE Spain , 2008, Int. J. Appl. Earth Obs. Geoinformation.

[17]  I. Ilia,et al.  Comparing the Performance of a Logistic Regression and a Random Forest Model in Landslide Susceptibility Assessments. the Case of Wuyaun Area, China , 2017 .

[18]  Mustafa Neamah Jebur,et al.  Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS , 2013 .

[19]  Saro Lee,et al.  Application of a weights-of-evidence method and GIS to regional groundwater productivity potential mapping. , 2012, Journal of environmental management.

[20]  Omid Rahmati,et al.  Spatial analysis of groundwater potential using weights-of-evidence and evidential belief function models and remote sensing , 2015, Arabian Journal of Geosciences.

[21]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Seyed Amir Naghibi,et al.  GIS-based landslide spatial modeling in Ganzhou City, China , 2016, Arabian Journal of Geosciences.

[23]  Biswajeet Pradhan,et al.  Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of , 2012 .

[24]  J. Peters,et al.  Random forests as a tool for ecohydrological distribution modelling , 2007 .

[25]  Binh Thai Pham,et al.  A Novel Classifier Based on Composite Hyper-cubes on Iterated Random Projections for Assessment of Landslide Susceptibility , 2018, Journal of the Geological Society of India.

[26]  A. Zhu,et al.  GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method , 2018 .

[27]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[28]  Keith Beven,et al.  TOPMODEL : a critique. , 1997 .

[29]  Jane Elith,et al.  Boosted Regression Trees for ecological modeling , 2011 .

[30]  Hasan Koyuncu,et al.  Artificial neural network based on rotation forest for biomedical pattern classification , 2013, 2013 36th International Conference on Telecommunications and Signal Processing (TSP).

[31]  Saro Lee,et al.  GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea , 2011 .

[32]  Abdulhamit Subasi,et al.  Breast cancer diagnosis using GA feature selection and Rotation Forest , 2015, Neural Computing and Applications.

[33]  Dieu Tien Bui,et al.  A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling , 2019, Bulletin of Engineering Geology and the Environment.

[34]  M. Pardo,et al.  Random forests and nearest shrunken centroids for the classification of sensor array data , 2008 .

[35]  Seyed Amir Naghibi,et al.  Prioritization of landslide conditioning factors and its spatial modeling in Shangnan County, China using GIS-based data mining algorithms , 2018, Bulletin of Engineering Geology and the Environment.

[36]  B. Muys,et al.  Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests , 2010 .

[37]  H. Pourghasemi,et al.  Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran , 2016 .

[38]  Seyed Amir Naghibi,et al.  GIS-based Groundwater Spring Potential Mapping Using Data Mining Boosted Regression Tree and Probabilistic Frequency Ratio Models in Iran , 2017 .

[39]  Jung Hyun Lee,et al.  A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping , 2014 .

[40]  H. Pourghasemi,et al.  Groundwater potential mapping at Kurdistan region of Iran using analytic hierarchy process and GIS , 2015, Arabian Journal of Geosciences.

[41]  Hamid Reza Pourghasemi,et al.  Assessment of a spatial multi-criteria evaluation to site selection underground dams in the Alborz Province, Iran , 2016 .

[42]  Peter Fox,et al.  Semantic e-Science , 2015, Earth Science Informatics.

[43]  Huijuan Lu,et al.  A cost-sensitive rotation forest algorithm for gene expression data classification , 2017, Neurocomputing.

[44]  Mustafa Neamah Jebur,et al.  Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS , 2014 .

[45]  Seyed Amir Naghibi,et al.  Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping , 2017, Water Resources Management.

[46]  Jon Atli Benediktsson,et al.  Hyperspectral Image Classification With Rotation Random Forest Via KPCA , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[47]  Dillon Matthew Carty,et al.  An Analysis of Boosted Regression Trees to Predict the Strength Properties of Wood Composites , 2011 .

[48]  Omid Rahmati,et al.  Application of Dempster-Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan, Iran. , 2016, The Science of the total environment.

[49]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[50]  Boris Schröder,et al.  How can statistical models help to determine driving factors of landslides , 2012 .

[51]  A-Xing Zhu,et al.  Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. , 2018, The Science of the total environment.

[52]  Hamid Reza Pourghasemi,et al.  Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran , 2015, Earth Science Informatics.

[53]  Zohre Sadat Pourtaghi,et al.  GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran , 2016, Environmental Earth Sciences.

[54]  Zohre Sadat Pourtaghi,et al.  GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran , 2014, Hydrogeology Journal.

[55]  Lee Saro,et al.  Ensemble of ground subsidence hazard maps using fuzzy logic , 2014 .

[56]  Mustafa Neamah Jebur,et al.  Landslide susceptibility mapping using ensemble bivariate and multivariate statistical models in Fayfa area, Saudi Arabia , 2015, Environmental Earth Sciences.

[57]  Binh Thai Pham,et al.  Machine Learning Methods of Kernel Logistic Regression and Classification and Regression Trees for Landslide Susceptibility Assessment at Part of Himalayan Area, India , 2018 .

[58]  Omid Rahmati,et al.  Delineation of groundwater potential zones using remote sensing and GIS-based data-driven models , 2016 .

[59]  Dieu Tien Bui,et al.  Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS , 2017 .

[60]  Tri Dev Acharya,et al.  Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China) , 2018 .

[61]  D. Bui,et al.  A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: A case study in the Himalayan area, India , 2017, International Journal of Sediment Research.

[62]  B. Pham,et al.  A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. , 2018, The Science of the total environment.

[63]  Iman Nasiri Aghdam,et al.  Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran) , 2016, Environmental Earth Sciences.

[64]  Frank T.-C. Tsai,et al.  A comparison study of DRASTIC methods with various objective methods for groundwater vulnerability assessment. , 2018, The Science of the total environment.

[65]  Seyed Amir Naghibi,et al.  Evaluation of four supervised learning methods for groundwater spring potential mapping in Khalkhal region (Iran) using GIS-based features , 2017, Hydrogeology Journal.

[66]  Bahareh Kalantar,et al.  Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS , 2018, Environmental Monitoring and Assessment.

[67]  Saro Lee,et al.  Ensemble-based landslide susceptibility maps in Jinbu area, Korea , 2012, Environmental Earth Sciences.

[68]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[69]  Biswajeet Pradhan,et al.  Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree , 2016, Landslides.

[70]  Wei Chen,et al.  GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. , 2018, The Science of the total environment.

[71]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[72]  B. Pham,et al.  Bagging based Support Vector Machines for spatial prediction of landslides , 2018, Environmental Earth Sciences.

[73]  Seyed Amir Naghibi,et al.  A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China , 2018, Bulletin of Engineering Geology and the Environment.

[74]  Shiuan Wan,et al.  A landslide expert system: image classification through integration of data mining approaches for multi-category analysis , 2012, Int. J. Geogr. Inf. Sci..

[75]  Mikhail Kanevski,et al.  Machine Learning Feature Selection Methods for Landslide Susceptibility Mapping , 2013, Mathematical Geosciences.

[76]  A. Ozdemir GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison , 2011 .

[77]  Daniel W. McKenney,et al.  Spatial models of site index based on climate and soil properties for two boreal tree species in Ontario, Canada , 2003 .

[78]  Seyed Amir Naghibi,et al.  GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran , 2015, Environmental Monitoring and Assessment.

[79]  B. Pradhan,et al.  Application of GIS based data driven evidential belief function model to predict groundwater potential zonation , 2014 .

[80]  Anuradha Eaturu,et al.  Biophysical and anthropogenic controls of forest fires in the Deccan Plateau, India. , 2008, Journal of environmental management.

[81]  Bahareh Kalantar,et al.  Groundwater potential mapping using a novel data-mining ensemble model , 2018, Hydrogeology Journal.

[82]  Ashok Srivastava,et al.  Machine Learning Methods , 2012 .

[83]  Wei Chen,et al.  A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China , 2017 .

[84]  I. Moore,et al.  Sediment Transport Capacity of Sheet and Rill Flow: Application of Unit Stream Power Theory , 1986 .

[85]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[86]  Mustafa Neamah Jebur,et al.  Earthquake induced landslide susceptibility mapping using an integrated ensemble frequency ratio and logistic regression models in West Sumatera Province, Indonesia , 2014 .

[87]  A. Ozdemir Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey) , 2011 .

[88]  Dieu Tien Bui,et al.  A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling , 2018, Geocarto International.

[89]  Hamid Reza Pourghasemi,et al.  A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS , 2018, Theoretical and Applied Climatology.

[90]  I. Moore,et al.  Digital terrain modelling: A review of hydrological, geomorphological, and biological applications , 1991 .

[91]  Arthur P. Dempster,et al.  A Generalization of Bayesian Inference , 1968, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[92]  Seyed Amir Naghibi,et al.  Groundwater Augmentation through the Site Selection of Floodwater Spreading Using a Data Mining Approach (Case study: Mashhad Plain, Iran) , 2018, Water.