Ensemble data mining modeling in corrosion of concrete sewer: A comparative study of network-based (MLPNN & RBFNN) and tree-based (RF, CHAID, & CART) models

Abstract This research aims to evaluate ensemble learning (bagging, boosting, and modified bagging) potential in predicting microbially induced concrete corrosion in sewer systems from the data mining (DM) perspective. Particular focus is laid on ensemble techniques for network-based DM methods, including multi-layer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) as well as tree-based DM methods, such as chi-square automatic interaction detector (CHAID), classification and regression tree (CART), and random forests (RF). Hence, an interdisciplinary approach is presented by combining findings from material sciences and hydrochemistry as well as data mining analyses to predict concrete corrosion. The effective factors on concrete corrosion such as time, gas temperature, gas-phase H2S concentration, relative humidity, pH, and exposure phase are considered as the models’ inputs. All 433 datasets are randomly selected to construct an individual model and twenty component models of boosting, bagging, and modified bagging based on training, validating, and testing for each DM base learners. Considering some model performance indices, (e.g., Root mean square error, RMSE; mean absolute percentage error, MAPE; correlation coefficient, r) the best ensemble predictive models are selected. The results obtained indicate that the prediction ability of the random forests DM model is superior to the other ensemble learners, followed by the ensemble Bag-CHAID method. On average, the ensemble tree-based models acted better than the ensemble network-based models; nevertheless, it was also found that taking the advantages of ensemble learning would enhance the general performance of individual DM models by more than 10%.

[1]  Jurg Keller,et al.  Predicting concrete corrosion of sewers using artificial neural network. , 2016, Water research.

[2]  Mevlut Ture,et al.  Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients , 2009, Expert Syst. Appl..

[3]  Jurg Keller,et al.  Evaluation of data-driven models for predicting the service life of concrete sewer pipes subjected to corrosion. , 2019, Journal of environmental management.

[4]  Tadahiro Mori,et al.  Microbial Corrosion of Concrete Sewer Pipes, H2S Production from Sediments and Determination of Corrosion Rate , 1991 .

[5]  Kenneth Gavin,et al.  Automatic classification of fine-grained soils using CPT measurements and Artificial Neural Networks , 2018, Adv. Eng. Informatics.

[6]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[7]  Amir M. Alani,et al.  An evolutionary approach to modelling concrete degradation due to sulphuric acid attack , 2014, Appl. Soft Comput..

[8]  Özgür Kisi,et al.  Daily pan evaporation modeling using chi-squared automatic interaction detector, neural networks, classification and regression tree , 2016, Comput. Electron. Agric..

[9]  Ahmed El-Shafie,et al.  Sensitivity analysis of artificial neural networks for just-suspension speed prediction in solid-liquid mixing systems: Performance comparison of MLPNN and RBFNN , 2019, Adv. Eng. Informatics.

[10]  Jurg Keller,et al.  Prediction of concrete corrosion in sewers with hybrid Gaussian processes regression model , 2017 .

[11]  Ersin Namli,et al.  High performance concrete compressive strength forecasting using ensemble models based on discrete wavelet transform , 2013, Eng. Appl. Artif. Intell..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Elie Azar,et al.  Evaluation of tree-based ensemble learning algorithms for building energy performance estimation , 2018 .

[14]  Shigeyuki Hamori,et al.  Ensemble Learning or Deep Learning? Application to Default Risk Analysis , 2018 .

[15]  Jurg Keller,et al.  Determining the long-term effects of H₂S concentration, relative humidity and air temperature on concrete sewer corrosion. , 2014, Water research.

[16]  Michael I. Miller,et al.  A comparison of random forest variable selection methods for classification prediction modeling , 2019, Expert Syst. Appl..

[17]  Deborah J. Roberts,et al.  Quantifying microbially induced deterioration of concrete: initial studies , 2002 .

[18]  Jui-Sheng Chou,et al.  Machine learning in concrete strength simulations: Multi-nation data analytics , 2014 .

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Ersin Namli,et al.  A comparative assessment of bagging ensemble models for modeling concrete slump flow , 2015 .

[21]  Changmin Kim,et al.  Classification of major construction materials in construction environments using ensemble classifiers , 2014, Adv. Eng. Informatics.

[22]  D. Mostofinejad,et al.  Implementing ANN to minimize sewage systems concrete corrosion with glass beads substitution , 2017 .

[23]  Neven Ukrainczyk,et al.  Advances in concrete materials for sewer systems affected by microbial induced concrete corrosion: A review. , 2018, Water research.

[24]  Özgür Kisi,et al.  Performance of radial basis and LM-feed forward artificial neural networks for predicting daily watershed runoff , 2013, Appl. Soft Comput..

[25]  Robert E. Melchers,et al.  Modelling concrete deterioration in sewers using theory and field observations , 2015 .

[26]  Jurgita Antucheviciene,et al.  HYBRID MULTIPLE CRITERIA DECISION MAKING METHODS: A REVIEW OF APPLICATIONS IN ENGINEERING , 2016 .

[27]  Chandranath Chatterjee,et al.  Uncertainty assessment and ensemble flood forecasting using bootstrap based artificial neural networks (BANNs) , 2010 .

[28]  Yoshua Bengio,et al.  Boosting Neural Networks , 2000, Neural Computation.

[29]  Mohd Wazir Mustafa,et al.  Ensemble Bagged Tree Based Classification for Reducing Non-Technical Losses in Multan Electric Power Company of Pakistan , 2019, Electronics.

[30]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[31]  Saso Dzeroski,et al.  Model-Tree Ensembles for noise-tolerant system identification , 2015, Adv. Eng. Informatics.

[32]  Jeffrey L. Davis,et al.  Analysis of concrete from corroded sewer pipe , 1998 .

[33]  Dieu Tien Bui,et al.  Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS , 2017 .

[34]  Jan Adamowski,et al.  Estimating the aeration coefficient and air demand in bottom outlet conduits of dams using GEP and decision tree methods , 2017 .