Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil

The main objective of this study is to evaluate and compare the performance of different machine learning (ML) algorithms, namely, Artificial Neural Network (ANN), Extreme Learning Machine (ELM), and Boosting Trees (Boosted) algorithms, considering the influence of various training to testing ratios in predicting the soil shear strength, one of the most critical geotechnical engineering properties in civil engineering design and construction. For this aim, a database of 538 soil samples collected from the Long Phu 1 power plant project, Vietnam, was utilized to generate the datasets for the modeling process. Different ratios (i.e., 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80/20, and 90/10) were used to divide the datasets into the training and testing datasets for the performance assessment of models. Popular statistical indicators, such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Correlation Coefficient (R), were employed to evaluate the predictive capability of the models under different training and testing ratios. Besides, Monte Carlo simulation was simultaneously carried out to evaluate the performance of the proposed models, taking into account the random sampling effect. The results showed that although all three ML models performed well, the ANN was the most accurate and statistically stable model after 1000 Monte Carlo simulations (Mean R = 0.9348) compared with other models such as Boosted (Mean R = 0.9192) and ELM (Mean R = 0.8703). Investigation on the performance of the models showed that the predictive capability of the ML models was greatly affected by the training/testing ratios, where the 70/30 one presented the best performance of the models. Concisely, the results presented herein showed an effective manner in selecting the appropriate ratios of datasets and the best ML model to predict the soil shear strength accurately, which would be helpful in the design and engineering phases of construction projects.

[1]  Biswajeet Pradhan,et al.  Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods. , 2019, The Science of the total environment.

[2]  Chee Peng Lim,et al.  Use of Artificial Neural Networks to Predict Drug Dissolution Profiles and Evaluation of Network Performance Using Similarity Factor , 2000, Pharmaceutical Research.

[3]  Binh Thai Pham,et al.  Development of advanced artificial intelligence models for daily rainfall prediction , 2020, Atmospheric Research.

[4]  Tien-Thinh Le,et al.  Flocculation-dewatering prediction of fine mineral tailings using a hybrid machine learning approach. , 2019, Chemosphere.

[5]  Stephen G Wright,et al.  Evaluation of Soil Shear Strengths for Slope and Retaining Wall Stability Analyses with Emphasis on High Plasticity Clays , 2005 .

[6]  Himan Shahabi,et al.  A novel hybrid approach of Bayesian Logistic Regression and its ensembles for landslide susceptibility assessment , 2018, Geocarto International.

[7]  Marcel Salathé,et al.  Using Deep Learning for Image-Based Plant Disease Detection , 2016, Front. Plant Sci..

[8]  B. Pham,et al.  Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods , 2017, Theoretical and Applied Climatology.

[9]  A. Milac,et al.  Evaluation of a neural networks QSAR method based on ligand representation using substituent descriptors. Application to HIV-1 protease inhibitors. , 2006, Journal of molecular graphics & modelling.

[10]  Nadhir Al-Ansari,et al.  Landslide Detection and Susceptibility Modeling on Cameron Highlands (Malaysia): A Comparison between Random Forest, Logistic Regression and Logistic Model Tree Algorithms , 2020, Forests.

[11]  Hiroshan Hettiarachchi,et al.  Closure of "Use of SPT Blow Counts to Estimate Shear Strength Properties of Soils: Energy Balance Approach" , 2009 .

[12]  Gye-Chun Cho,et al.  Shear strength estimation of sandy soils using shear wave velocity , 2007 .

[13]  A-Xing Zhu,et al.  Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. , 2018, The Science of the total environment.

[14]  A. Ghanbarzadeh,et al.  The potential of different artificial neural network (ANN) techniques in daily global solar radiation modeling based on meteorological data , 2010 .

[15]  Hossein Moayedi,et al.  Hybridizing four wise neural-metaheuristic paradigms in predicting soil shear strength , 2020 .

[16]  Binh Thai Pham,et al.  Development of artificial intelligence models for the prediction of Compression Coefficient of soil: An application of Monte Carlo sensitivity analysis. , 2019, The Science of the total environment.

[17]  Panagiotis G. Asteris,et al.  Concrete compressive strength using artificial neural networks , 2019, Neural Computing and Applications.

[18]  Binh Thai Pham,et al.  Landslide susceptibility modeling using different artificial intelligence methods: a case study at Muong Lay district, Vietnam , 2019, Geocarto International.

[19]  Orencio Monje Vilar,et al.  A simplified procedure to estimate the shear strength envelope of unsaturated soils , 2006 .

[20]  Nadhir Al-Ansari,et al.  A Novel Hybrid Soft Computing Model Using Random Forest and Particle Swarm Optimization for Estimation of Undrained Shear Strength of Soil , 2020, Sustainability.

[21]  David J. Williams,et al.  A relationship describing the shear strength of unsaturated soils , 1999 .

[22]  Vuong Minh Le,et al.  Development of Hybrid Machine Learning Models for Predicting the Critical Buckling Load of I-Shaped Cellular Beams , 2019 .

[23]  Nadhir Al-Ansari,et al.  Groundwater Potential Mapping Combining Artificial Neural Network and Real AdaBoost Ensemble Technique: The DakNong Province Case-study, Vietnam , 2020, International journal of environmental research and public health.

[24]  Liborio Cavaleri,et al.  A Novel Heuristic Algorithm for the Modeling and Risk Assessment of the COVID-19 Pandemic Phenomenon , 2020, Computer Modeling in Engineering & Sciences.

[25]  Alec Westley Skempton,et al.  Residual strength of clays in landslides, folded strata and the laboratory , 1985 .

[26]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[27]  Dieu Tien Bui,et al.  A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling , 2018, Geocarto International.

[28]  Sai K. Vanapalli,et al.  Evaluation of Empirical Procedures for Predicting the Shear Strength of Unsaturated Soils , 2006 .

[29]  Panagiotis G. Asteris,et al.  Application of group method of data handling technique in assessing deformation of rock mass , 2020 .

[30]  Thai Binh Pham,et al.  Using Artificial Neural Network (ANN) for prediction of soil coefficient of consolidation , 2020 .

[31]  Panagiotis G. Asteris,et al.  Accuracy assessment of extreme learning machine in predicting soil compression coefficient , 2020 .

[32]  Hai-Bang Ly,et al.  Prediction of Pile Axial Bearing Capacity Using Artificial Neural Network and Random Forest , 2020 .

[33]  J C Gertrudes,et al.  Machine learning techniques and drug design. , 2012, Current medicinal chemistry.

[34]  Christian Soize,et al.  Generalized stochastic approach for constitutive equation in linear elasticity: a random matrix model , 2011, International Journal for Numerical Methods in Engineering.

[35]  Cumaraswamy Vipulanandan,et al.  Roughness and Unit Side Resistances of Drilled Shafts Socketed in Clay Shale and Limestone , 2008 .

[36]  Dieu Tien Bui,et al.  A novel artificial intelligence approach based on Multi-layer Perceptron Neural Network and Biogeography-based Optimization for predicting coefficient of consolidation of soil , 2019, CATENA.

[37]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[38]  Wei Chen,et al.  Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic competitive algorithm. , 2019, Journal of environmental management.

[39]  G. Raghavan,et al.  Shear Strength Prediction of Compacted Soils with Varying Added Organic Matter Contents , 1986 .

[40]  P. G. Asteris,et al.  Estimation of axial load-carrying capacity of concrete-filled steel tubes using surrogate models , 2020, Neural Computing and Applications.

[41]  Ataollah Shirzadi,et al.  Development of 48-hour Precipitation Forecasting Model using Nonlinear Autoregressive Neural Network , 2019, Lecture Notes in Civil Engineering.

[42]  Liborio Cavaleri,et al.  On the metaheuristic models for the prediction of cement-metakaolin mortars compressive strength , 2020 .

[43]  Jin Zhang,et al.  An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping , 2016 .

[44]  Nadhir Al-Ansari,et al.  Extreme Learning Machine Based Prediction of Soil Shear Strength: A Sensitivity Analysis Using Monte Carlo Simulations and Feature Backward Elimination , 2020, Sustainability.

[45]  Akter Hussain,et al.  Price Prediction of Share Market using Artificial Neural Network (ANN) , 2011 .

[46]  Panagiotis G. Asteris,et al.  Prediction of ground vibration induced by blasting operations through the use of the Bayesian Network and random forest models , 2020 .

[47]  Nhat-Duc Hoang,et al.  A hybrid computational intelligence approach for predicting soil shear strength for urban housing construction: a case study at Vinhomes Imperia project, Hai Phong city (Vietnam) , 2019, Engineering with Computers.

[48]  Binh Thai Pham,et al.  Optimization of Artificial Intelligence System by Evolutionary Algorithm for Prediction of Axial Capacity of Rectangular Concrete Filled Steel Tubes under Compression , 2020, Materials.

[49]  Pijush Samui,et al.  Machine learning techniques applied to prediction of residual strength of clay , 2011 .

[50]  John J. Clague,et al.  A novel ensemble learning based on Bayesian Belief Network coupled with an extreme learning machine for flash flood susceptibility mapping , 2020, Eng. Appl. Artif. Intell..

[51]  Dieu Tien Bui,et al.  Development of a Novel Hybrid Intelligence Approach for Landslide Spatial Prediction , 2019, Applied Sciences.

[52]  A. Kaya Residual and Fully Softened Strength Evaluation of Soils using Artificial Neural Networks , 2009 .

[53]  Wei Chen,et al.  Landslide spatial modeling: Introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques , 2017 .

[54]  Binh Thai Pham,et al.  Development of an AI Model to Measure Traffic Air Pollution from Multisensor and Weather Data , 2019, Sensors.

[55]  Binh Thai Pham,et al.  Daily Rainfall Prediction Using Nonlinear Autoregressive Neural Network , 2020 .

[56]  P. G. Asteris,et al.  A Novel Feature Selection Approach Based on Tree Models for Evaluating the Punching Shear Capacity of Steel Fiber-Reinforced Concrete Flat Slabs , 2020, Materials.

[57]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[58]  B. Bradley,et al.  Development of an empirical correlation for predicting shear wave velocity of Christchurch soils from cone penetration test data , 2015 .

[59]  Nhat-Duc Hoang,et al.  A swarm intelligence-based machine learning approach for predicting soil shear strength for road construction: a case study at Trung Luong National Expressway Project (Vietnam) , 2018, Engineering with Computers.

[60]  Vuong Minh Le,et al.  A Robustness Analysis of Different Nonlinear Autoregressive Networks Using Monte Carlo Simulations for Predicting High Fluctuation Rainfall , 2020 .

[61]  Pijush Samui,et al.  A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping , 2019, CATENA.

[62]  Binh Thai Pham,et al.  Investigation and Optimization of the C-ANN Structure in Predicting the Compressive Strength of Foamed Concrete , 2020, Materials.

[63]  Binh Thai Pham,et al.  Prediction of shear strength of soft soil using machine learning methods , 2018, CATENA.

[64]  Andrey P. Jivkov,et al.  Monte Carlo Simulations of Mesoscale Fracture of Concrete with Random Aggregates and Pores: a Size Effect Study , 2015 .

[65]  Biswajeet Pradhan,et al.  Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS , 2012, Comput. Geosci..

[66]  Nguyen Trung Thang,et al.  Adaptive Network Based Fuzzy Inference System with Meta-Heuristic Optimizations for International Roughness Index Prediction , 2019, Applied Sciences.

[67]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[68]  S. Shibuya,et al.  Application of suction stress for estimating unsaturated shear strength of soils using direct shear testing under low confining pressure , 2010 .

[69]  Seung-Rae Lee,et al.  A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon, South Korea , 2016 .

[70]  Vuong Minh Le,et al.  A Sensitivity and Robustness Analysis of GPR and ANN for High-Performance Concrete Compressive Strength Prediction Using a Monte Carlo Simulation , 2020, Sustainability.

[71]  A. Fourie,et al.  A strength prediction model using artificial intelligence for recycling waste tailings as cemented paste backfill , 2018 .

[72]  Nadhir Al-Ansari,et al.  Landslide Susceptibility Mapping Using Machine Learning Algorithms and Remote Sensing Data in a Tropical Environment , 2020, International journal of environmental research and public health.

[73]  Wei Chen,et al.  Hybrid Computational Intelligence Methods for Landslide Susceptibility Mapping , 2020, Symmetry.

[74]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[75]  P. G. Asteris,et al.  Examining Hybrid and Single SVM Models with Different Kernels to Predict Rock Brittleness , 2020, Sustainability.

[76]  Hossein Motaghedi,et al.  Analytical Approach for Determination of Soil Shear Strength Parameters from CPT and CPTu Data , 2014 .

[77]  D. Bui,et al.  Spatial Prediction of Rainfall-Induced Landslides Using Aggregating One-Dependence Estimators Classifier , 2018, Journal of the Indian Society of Remote Sensing.

[78]  John J. Clague,et al.  Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran , 2020 .

[79]  K. Yin,et al.  Landslide susceptibility mapping based on self-organizing-map network and extreme learning machine , 2017 .

[80]  H. Marui,et al.  A New Method for the Correlation of Residual Shear Strength of the Soil with Mineralogical Composition , 2005 .

[81]  Y F Xu,et al.  Fractal Approach to Unsaturated Shear Strength , 2004 .

[82]  M. A. Tekinsoy,et al.  An equation for predicting shear strength envelope with respect to matric suction , 2004 .

[83]  Bahareh Kalantar,et al.  Spotted Hyena Optimizer and Ant Lion Optimization in Predicting the Shear Strength of Soil , 2019, Applied Sciences.

[84]  Binh Thai Pham,et al.  Hybrid Artificial Intelligence Approaches for Predicting Buckling Damage of Steel Columns Under Axial Compression , 2019, Materials.

[85]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[86]  Nadhir Al-Ansari,et al.  Soft Computing Ensemble Models Based on Logistic Regression for Groundwater Potential Mapping , 2020, Applied Sciences.

[87]  B. Pham,et al.  Prediction of Shear Strength of Soil Using Direct Shear Test and Support Vector Machine Model , 2020, The Open Construction and Building Technology Journal.

[88]  M. Panahi,et al.  Social Vulnerability Assessment Using Artificial Neural Network (ANN) Model for Earthquake Hazard in Tabriz City, Iran , 2018, Sustainability.

[89]  Binh Thai Pham,et al.  A novel hybrid model of Bagging-based Naïve Bayes Trees for landslide susceptibility assessment , 2019, Bulletin of Engineering Geology and the Environment.

[90]  Hossein Moayedi,et al.  A spatially explicit deep learning neural network model for the prediction of landslide susceptibility , 2020 .

[91]  Binh Thai Pham,et al.  Prediction and Sensitivity Analysis of Bubble Dissolution Time in 3D Selective Laser Sintering Using Ensemble Decision Trees , 2019, Materials.

[92]  Biswajeet Pradhan,et al.  Novel Hybrid Integration Approach of Bagging-Based Fisher’s Linear Discriminant Function for Groundwater Potential Analysis , 2019, Natural Resources Research.

[93]  Liborio Cavaleri,et al.  Mapping and holistic design of natural hydraulic lime mortars , 2020 .

[94]  T. Cheng,et al.  Mapping landslide susceptibility and types using Random Forest , 2018 .

[95]  Nadhir Al-Ansari,et al.  GIS Based Hybrid Computational Approaches for Flash Flood Susceptibility Assessment , 2020, Water.

[96]  Le,et al.  Improvement of ANFIS Model for Prediction of Compressive Strength of Manufactured Sand Concrete , 2019, Applied Sciences.

[97]  M. Gutierrez,et al.  Determination of the shear strength of unsaturated soils using the multistage direct shear test , 2011 .

[98]  A. Jalalian,et al.  Soil shear strength prediction using intelligent systems: artificial neural networks and an adaptive neuro-fuzzy inference system , 2012 .

[99]  Sahana,et al.  Landslide Susceptibility Assessment by Novel Hybrid Machine Learning Algorithms , 2019, Sustainability.

[100]  Binh Thai Pham,et al.  Ensemble modeling of landslide susceptibility using random subspace learner and different decision tree classifiers , 2020, Geocarto International.

[101]  Ataollah Shirzadi,et al.  Development of an Artificial Intelligence Approach for Prediction of Consolidation Coefficient of Soft Soil: A Sensitivity Analysis , 2019, The Open Construction and Building Technology Journal.

[102]  Bahareh Kalantar,et al.  Novel Nature-Inspired Hybrids of Neural Computing for Estimating Soil Shear Strength , 2019, Applied Sciences.

[103]  Binh Thai Pham,et al.  Computational Hybrid Machine Learning Based Prediction of Shear Capacity for Steel Fiber Reinforced Concrete Beams , 2020, Sustainability.

[104]  Shaul Mordechai,et al.  Applications of Monte Carlo method in science and engineering , 2011 .

[105]  Panagiotis G. Asteris,et al.  A comparative study of ANN and ANFIS models for the prediction of cement-based mortar materials compressive strength , 2020, Neural Computing and Applications.

[106]  De’an Sun,et al.  A fractal model for soil pores and its application to determination of water permeability , 2002 .