论文信息 - Process Variable Importance Analysis by Use of Random Forests in a Shapley Regression Framework

Process Variable Importance Analysis by Use of Random Forests in a Shapley Regression Framework

Linear regression is often used as a diagnostic tool to understand the relative contributions of operational variables to some key performance indicator or response variable. However, owing to the nature of plant operations, predictor variables tend to be correlated, often highly so, and this can lead to significant complications in assessing the importance of these variables. Shapley regression is seen as the only axiomatic approach to deal with this problem but has almost exclusively been used with linear models to date. In this paper, the approach is extended to random forests, and the results are compared with some of the empirical variable importance measures widely used with these models, i.e., permutation and Gini variable importance measures. Four case studies are considered, of which two are based on simulated data and two on real world data from the mineral process industries. These case studies suggest that the random forest Shapley variable importance measure may be a more reliable indicator of the influence of predictor variables than the other measures that were considered. Moreover, the results obtained with the Gini variable importance measure was as reliable or better than that obtained with the permutation measure of the random forest.

Chris Aldrich | C. Aldrich

[1] U. Grömping. Variable importance in regression models , 2015 .

[2] Moe Thandar Wynn,et al. Isolating the impact of rock properties and operational settings on minerals processing performance: A data-driven approach , 2018, Minerals Engineering.

[3] G. Tutz,et al. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[4] Derek B. Apel,et al. Purities prediction in a manufacturing froth flotation plant: the deep learning techniques , 2020, Neural Computing and Applications.

[5] D. Budescu,et al. The dominance analysis approach for comparing predictors in multiple regression. , 2003, Psychological methods.

[6] Chris Aldrich,et al. Unsupervised Process Fault Detection with Random Forests , 2010 .

[7] S. S. Matin,et al. Power-draw prediction by random forest based on operating parameters for an industrial ball mill , 2020 .

[8] S. S. Matin,et al. Flotation of coarse particles by hydrodynamic cavitation generated in the presence of conventional reagents , 2019, Separation and Purification Technology.

[9] Tang Zhaohui,et al. DTCWT-based zinc fast roughing working condition identification , 2018, Chinese Journal of Chemical Engineering.

[10] Mariano P. Consens,et al. Predicting blast-induced outcomes using random forest models of multi-year blasting data from an open pit mine , 2019, Bulletin of Engineering Geology and the Environment.

[11] Osnat Israeli,et al. A Shapley-based decomposition of the R-Square of a linear regression , 2007 .