Modeling of complex industrial process based on active semi-supervised clustering

Abstract Since industrial processes have a wide range of operating conditions, it is difficult to build a single global model that describes a process. One solution that is widely used in control engineering practice is to combine multiple models based on collected process data. For this approach to be successful, it is important to cluster the data before the modeling. In this study, pairwise constraints and an active-learning method were incorporated into the affinity propagation algorithm, resulting in a new method called active semi-supervised affinity propagation (ASSAP) clustering. To apply ASSAP to the modeling of industrial processes, an active-learning strategy is firstly used to obtain constraints on data based on the angle of change between two data points and the probability of their belonging to the same class, and then the constraints are used to adjust the clustering process so as to improve the clustering precision. Finally, the least-squares-support-vector-machine (LS-SVM) is used to build a submodel for each cluster of data points, and then all the sub-models are integrated into a model for the whole data set. Verification of the ASSAP method was carried out on data from the UCI (University of California, Irvine) Machine Learning Repository and Olivetti dataset. In addition, ASSAP and LS-SVM are combined to be applied to the data of the combustion process of a coke oven. The result shows the effectiveness of the method of modeling of complex industrial process based on ASSAP.

[1]  Siu Cheung Hui,et al.  Automatic fuzzy ontology generation for semantic help-desk support , 2006, IEEE Transactions on Industrial Informatics.

[2]  Xiao Wu,et al.  Data-Driven Modeling and Predictive Control for Boiler–Turbine Unit , 2013, IEEE Transactions on Energy Conversion.

[3]  Fabrizio Angiulli,et al.  Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets , 2007, IEEE Transactions on Knowledge and Data Engineering.

[4]  Djemel Ziou,et al.  Terahertz image segmentation using k-means clustering based on weighted feature learning and random pixel sampling , 2016, Neurocomputing.

[5]  Ying Liu,et al.  Real time prediction for converter gas tank levels based on multi-output least square support vector regressor , 2012 .

[6]  Biao Huang,et al.  A Bayesian approach to design of adaptive multi-model inferential sensors with application in oil sand industry , 2012 .

[7]  M. Weigt,et al.  Unsupervised and semi-supervised clustering by message passing: soft-constraint affinity propagation , 2007, 0712.1165.

[8]  Ulrich Bodenhofer,et al.  APCluster: an R package for affinity propagation clustering , 2011, Bioinform..

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Brendan J. Frey,et al.  Semi-Supervised Affinity Propagation with Instance-Level Constraints , 2009, AISTATS.

[11]  Deyong You,et al.  Multisensor Fusion System for Monitoring High-Power Disk Laser Welding Using Support Vector Machine , 2014, IEEE Transactions on Industrial Informatics.

[12]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[13]  Daoqiang Zhang,et al.  Pairwise Constraint-Guided Sparse Learning for Feature Selection , 2016, IEEE Transactions on Cybernetics.

[14]  Pierluigi Siano,et al.  Real Time Operation of Smart Grids via FCN Networks and Optimal Power Flow , 2012, IEEE Transactions on Industrial Informatics.

[15]  Guohai Liu,et al.  Internal Model Control of Permanent Magnet Synchronous Motor Using Support Vector Machine Generalized Inverse , 2013, IEEE Transactions on Industrial Informatics.

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Chun-Liang Li,et al.  Active Learning Using Hint Information , 2015, Neural Computation.

[18]  Peifeng Niu,et al.  Model NOx emissions by least squares support vector machine with tuning based on ameliorated teaching–learning-based optimization , 2013 .

[19]  Nozha Boujemaa,et al.  Active semi-supervised fuzzy clustering , 2008, Pattern Recognit..

[20]  Witold Pedrycz,et al.  Time series long-term forecasting model based on information granules and fuzzy clustering , 2015, Eng. Appl. Artif. Intell..

[21]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[22]  Bo Wang,et al.  Face identification using reference-based features with message passing model , 2013, Neurocomputing.

[23]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[24]  Vincenzo Loia,et al.  Automatic constraints generation for semisupervised clustering: experiences with documents classification , 2016, Soft Comput..

[25]  Henry Leung,et al.  Data-driven based model for flow prediction of steam system in steel industry , 2012, Inf. Sci..

[26]  Weihua Cao,et al.  Integrated soft sensing of coke-oven temperature , 2011 .

[27]  Jochen Steimel,et al.  A framework for the modeling and optimization of process superstructures under uncertainty , 2014 .

[28]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[29]  Rajesh Kumar,et al.  Real-Time Implementation of a Harmony Search Algorithm-Based Clustering Protocol for Energy-Efficient Wireless Sensor Networks , 2014, IEEE Transactions on Industrial Informatics.

[30]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[31]  Ioannis A. Maraziotis,et al.  A semi-supervised fuzzy clustering algorithm applied to gene expression data , 2012, Pattern Recognit..

[32]  J. V. Milanovic,et al.  Wind Farm Model Aggregation Using Probabilistic Clustering , 2013, IEEE Transactions on Power Systems.

[33]  Zhiqiang Ge,et al.  Active learning strategy for smart soft sensor development under a small number of labeled data samples , 2014 .

[34]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[35]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[36]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[37]  Rómer Rosales,et al.  Active Learning from Relative Comparisons , 2015, IEEE Transactions on Knowledge and Data Engineering.

[38]  Mohammad Bagher Menhaj,et al.  Nonlinear system identification based on a self-organizing type-2 fuzzy RBFN , 2016, Eng. Appl. Artif. Intell..