Improvement of identification of blast furnace ironmaking process by outlier detection and missing value imputation

Abstract The control of blast furnace ironmaking process requires model of process dynamics accurate enough to facilitate the control strategies. However, data sets collected from blast furnace contain considerable number of missing values and outliers. These values can significantly affect subsequent statistical analysis and thus the identification of the whole process, so it becomes much important to deal with these values. This paper considers a data processing procedure including missing value imputation and outlier detection, and examines the impact of processing to the identification of blast furnace ironmaking process. Missing values are imputed based on the decision tree algorithm and outliers are identified and discarded through a set of multivariate outlier detection methods. The data sets before and after processing are then used for identification. Two classic identification methods, N4SID (numerical algorithms for state space subspace system identification) and PEM (prediction error method) are considered and a comparative study is presented.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Pierre Wikström Data Mining Analysis of the Relationship Between Input Variables and Hot Metal Silicon in a Blast Furnace , 2005 .

[3]  F. Obeso,et al.  Hot metal temperature prediction in blast furnace using advanced model based on fuzzy logic tools , 2007 .

[4]  P. Filzmoser A MULTIVARIATE OUTLIER DETECTION METHOD , 2004 .

[5]  Leo H. Chiang,et al.  Exploring process data with the use of robust outlier detection algorithms , 2003 .

[6]  L. Ljung Prediction error estimation methods , 2002 .

[7]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[8]  Henrik Saxén,et al.  Application of nonlinear time series analysis to the prediction of silicon content of pig iron , 2002 .

[9]  Abdul Rahman Mohamed,et al.  Neural networks for the identification and control of blast furnace hot metal quality , 2000 .

[10]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[11]  Henrik Saxén,et al.  Time-Varying Event-Internal Trends in Predictive ModelingMethods with Applications to Ladlewise Analyses of Hot Metal Silicon Content , 2003 .

[12]  Bart De Moor,et al.  N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems , 1994, Autom..

[13]  T. Bhattacharya Prediction of Silicon Content in Blast Furnace Hot Metal Using Partial Least Squares (PLS) , 2005 .

[14]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Frank Pettersson,et al.  Nonlinear Prediction of the Hot Metal Silicon Content in the Blast Furnace , 2007 .

[16]  Henrik Saxén,et al.  On the Development of Predictive Models with Applications to a Metallurgical Process , 2000 .

[17]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[18]  Terrence J. Sejnowski,et al.  Variational Bayesian Learning of ICA with Missing Data , 2003, Neural Computation.

[19]  Frank Pettersson,et al.  A genetic algorithms based multi-objective neural net applied to noisy blast furnace data , 2007, Appl. Soft Comput..

[20]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[21]  Clemens Reimann,et al.  Multivariate outlier detection in exploration geochemistry , 2005, Comput. Geosci..

[22]  Bruno Crémilleux,et al.  MVC - a preprocessing method to deal with missing values , 1999, Knowl. Based Syst..

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[24]  David Haziza,et al.  On the Construction of Imputation Classes in Surveys , 2007 .

[25]  Desire L. Massart,et al.  Methods for outlier detection in prediction , 2002 .

[26]  Liu Xiang-guan,et al.  Subspace method for identification and control of blast furnace ironmaking process , 2008, 2008 American Control Conference.

[27]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[28]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[29]  R. Garrett The chi-square plot: a tool for multivariate outlier recognition , 1989 .

[30]  Ronald K. Pearson,et al.  Outliers in process modeling and identification , 2002, IEEE Trans. Control. Syst. Technol..

[31]  Jian Chen,et al.  A predictive system for blast furnaces by integrating a neural network with qualitative analysis , 2001 .

[32]  Gang Du,et al.  A Blast Furnace Prediction Model Combining Neural Network with Partial Least Square Regression , 2005 .

[33]  Faustino Obeso,et al.  Blast furnace hot metal temperature prediction through neural networks-based models , 2004 .

[34]  Julius H. Strassburger,et al.  Blast furnace- theory and practice , 1969 .

[35]  Per-Olof Gutman,et al.  Modelling and prediction of bending stiffness for paper board manufacturing , 1998 .

[36]  Chuanhou Gao,et al.  Using non‐linear GARCH model to predict silicon content in blast furnace hot metal , 2008 .

[37]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[38]  Gao,et al.  Chaotic Identification and Prediction of Silicon Content in Hot Metal , 2005 .

[39]  Tariq Samad,et al.  Imputation of Missing Data in Industrial Databases , 1999, Applied Intelligence.

[40]  S. Shankar Sastry,et al.  Special issue on “Networked Embedded Hybrid Control Systems” , 2008 .

[41]  R. Morales,et al.  Inductive learning models with missing values , 2006, Math. Comput. Model..