Practical recommendations for machine learning in underground rock engineering – On algorithm development, data balancing, and input variable selection

Research has demonstrated that machine learning algorithms (MLAs) are a powerful addition to the rock engineering toolbox, and yet they remain a largely untapped resource in engineering practice. The reluctance to adopt MLAs as part of standard practice is often attributed to the ‘opaque’ nature of the algorithms, the complexity in developing them, and the difficulty in determining how the algorithms use the datasets. This article presents tools and processes for developing MLAs, input selection, and data balancing for practical underground rock engineering. MLAs for classification and regression – two main machine learning applications – are presented in terms of developing MLA to extract information from the dataset to obtain the desired output. Engineering verification metrics are selected based on their suitability for specific output. Methods for input selection and data balancing are discussed with a focus on selecting appropriate input data for the problem without introducing bias or excess complexity. Each tool and process for algorithm development, data preparation, and input selection is illustrated with a case study. This article demonstrates that geotechnical practitioners can extract additional value by applying MLAs to rock engineering problems. Once an understanding of the functions of MLAs is reached, the building blocks and open‐source code are available to be adapted to suit the rock mass behaviour of interest.

[1]  K. Phoon,et al.  Advances in data-driven subsurface mapping , 2021, The Evolution of Geotech - 25 Years of Innovation.

[2]  Usman T. Khan,et al.  Resampling and ensemble techniques for improving ANN-based high-flow forecast accuracy , 2021 .

[3]  Zaobao Liu,et al.  Hard-rock tunnel lithology prediction with TBM construction big data using a global-attention-mechanism-based LSTM network , 2021 .

[4]  E. Snieder,et al.  A comprehensive comparison of four input variable selection methods for artificial neural network flow forecasting models , 2020 .

[5]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[6]  M. Diederichs,et al.  Underground Excavation Behaviour of the Queenston Formation: Tunnel Back Analysis for Application to Shaft Damage Dimension Prediction , 2015, Rock Mechanics and Rock Engineering.

[7]  Luís Torgo,et al.  A Survey of Predictive Modelling under Imbalanced Distributions , 2015, ArXiv.

[8]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[9]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[10]  Baowen Xu,et al.  Testing and validating machine learning classifiers by metamorphic testing , 2011, J. Syst. Softw..

[11]  Jianxun He,et al.  Prediction of event-based stormwater runoff quantity and quality by ANNs developed using PMI-based input selection , 2011 .

[12]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[14]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[15]  J. N. Moreira,et al.  HOW TO CITE THIS PAPER , 2020 .

[16]  Thomas Marcher,et al.  Comparison of artificial neural networks for TBM data classification , 2019 .

[17]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[18]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[19]  Manfred Borovcnik,et al.  A Probabilistic Perspective , 1991 .