A Data Cleaning Approach for a Structural Health Monitoring System in a 75 MW Electric Arc Ferronickel Furnace

Within a model of scientific and technical cooperation between the smelting company Cerro Matoso S.A. (CMSA) and the Universidad Nacional de Colombia (UNAL), a project was developed in order to take advantage of the data that were obtained from a sensor network in a ferronickel electric arc furnace at CMSA to improve the structural health monitoring process. Through this sensor network, online data are obtained on the temperature measurement along the refractory lining of the electric furnace, as well as heat fluxes and chemical characterization of the minerals on each stage of the process. These data are stored in a local database, which stores several years of historical data with valuable information for control and analysis purposes. These data reflect the behavior of the industrial process and can be used in the development of machine learning models to predict some of the electric arc furnace operation parameters, and thus improve the decision-making process. Currently, most of the data are analyzed by the experts of the structural control department, but, due to the large amount of data, the development of analytical tools is necessary to support their work. This paper proposes a data cleaning approach for improving data quality by creating a set of rules and filters based on both expert judgment and best practices in data quality. A statistical analysis was also carried out in order to detect variables with anomalies and outliers, which do not reflect real operation parameters and belong to anomalous data that should not be considered for modelling. With the proposed process, the quality of the data was improved and abnormal data were eliminated in order to consolidate a clean data set for later use in the development of machine learning models. This work contributes on understanding data cleansing rules that must be considered in order to reflect the real behavior of the electric furnace operation for further analysis and modeling tasks.