On Generative Topographic Mapping and Graph Theory combined approach for unsupervised non-linear data visualization and fault identification

Abstract Process monitoring of chemical plants relies on two steps: discriminating anomalies (fault detection) and characterizing them (fault identification). This work proposes a combined Generative Topographic Mapping (GTM) and Graph Theory (GT) approach. GTM highlights system features, reducing variable dimensionality and providing a strategy for calculating similarity between samples. GT then clusters them using networks, discriminating normal and anomalous entries. Because of biased normal and anomalous labeling, however, the methodology proposed is unsupervised, meaning that labels are inexistent. Three case studies were considered: a simulation data set, Tennessee Eastman process and an industrial data set. Principal Component Analysis (PCA), dynamic PCA and kernel PCA indexes ( Q and T 2 ) alongside GTM and GT independent monitoring methodologies were used for comparison, considering supervised and unsupervised approaches. For the industrial scenario, soft sensors were used for assessing discrimination performance. The proposed method, while unsupervised, discriminated normal states similarly to supervised strategies, justifying its development.

[1]  Alban Arrault,et al.  Generative Topographic Mapping-Based Classification Models and Their Applicability Domain: Application to the Biopharmaceutics Drug Disposition Classification System (BDDCS) , 2013, J. Chem. Inf. Model..

[2]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Zhi-huan Song,et al.  Distributed PCA Model for Plant-Wide Process Monitoring , 2013 .

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  S. Qin,et al.  Multimode process monitoring with Bayesian inference‐based finite Gaussian mixture models , 2008 .

[6]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[7]  John Nerbonne,et al.  Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features , 2011, Comput. Speech Lang..

[8]  R. Brereton,et al.  One class classifiers for process monitoring illustrated by the application to online HPLC of a continuous process , 2010 .

[9]  Leo H. Chiang,et al.  Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis , 2000 .

[10]  Angelo Carotti,et al.  QSAR and QSPR Studies of a Highly Structured Physicochemical Domain , 2006, J. Chem. Inf. Model..

[11]  B. Rienties,et al.  Understanding friendship and learning networks of international and host students using longitudinal Social Network Analysis , 2014 .

[12]  Santo Fortunato,et al.  Limits of modularity maximization in community detection , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Richard D. Braatz,et al.  Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis , 2000 .

[14]  Uwe Kruger,et al.  Synthesis of T2 and Q statistics for process monitoring , 2004 .

[15]  Hiromasa Kaneko,et al.  Applicability domains and accuracy of prediction of soft sensor models , 2011 .

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[17]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[18]  K. Funatsu,et al.  Multivariate Statistical Process Control Method Including Soft Sensors for Both Early and Accurate Fault Detection , 2014 .

[19]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[21]  Hiromasa Kaneko,et al.  Adaptive soft sensor based on online support vector regression and Bayesian ensemble learning for various states in chemical plants , 2014 .

[22]  Michael Y. Hu,et al.  Monitoring the Quality of a Chemical Production Process Using the Joint Estimation Method , 1995, J. Chem. Inf. Comput. Sci..

[23]  Mourad Badri,et al.  Improving Class Cohesion Measurement: Towards a Novel Approach Using Hierarchical Clustering , 2012 .

[24]  Navneet K Dhand,et al.  The importance of location in contact networks: Describing early epidemic spread using spatial social network analysis. , 2011, Preventive veterinary medicine.

[25]  Frank Harary,et al.  Graph Theory , 2016 .

[26]  Stelios Psarakis,et al.  Multivariate statistical process control charts: an overview , 2007, Qual. Reliab. Eng. Int..

[27]  Theodora Kourti,et al.  Application of latent variable methods to process control and multivariate statistical process control in industry , 2005 .

[28]  José L. Medina-Franco,et al.  Visualization of Molecular Fingerprints , 2011, J. Chem. Inf. Model..

[29]  Xv He-nan Fault Diagnosis in Chemical Processes Based on WPA and WLS-SVM , 2010 .

[30]  Jin Hyun Park,et al.  Fault detection and identification of nonlinear processes based on kernel PCA , 2005 .

[31]  C K Yoo,et al.  Disturbance detection and isolation in the activated sludge process. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[32]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[33]  C. Yoo,et al.  Nonlinear process monitoring using kernel principal component analysis , 2004 .

[34]  Xiuxi Li,et al.  Nonlinear dynamic principal component analysis for on-line process monitoring and diagnosis , 2000 .

[35]  John C. Young,et al.  Multivariate Statistical Process Control , 2013 .

[36]  Stefan Rannar,et al.  A Novel Approach Using Hierarchical Clustering To Select Industrial Chemicals for Environmental Impact Assessment , 2010, J. Chem. Inf. Model..

[37]  Christos Georgakis,et al.  Disturbance detection and isolation by dynamic principal component analysis , 1995 .

[38]  Hiromasa Kaneko,et al.  Flour concentration prediction using GAPLS and GAWLS focused on data sampling issues and applicability domain , 2014 .

[39]  Weihua Li,et al.  Recursive PCA for adaptive process monitoring , 1999 .

[40]  Hiromasa Kaneko,et al.  Combined generative topographic mapping and graph theory unsupervised approach for nonlinear fault identification , 2015 .

[41]  Dragos Horvath,et al.  Chemical Data Visualization and Analysis with Incremental Generative Topographic Mapping: Big Data Challenge , 2015, J. Chem. Inf. Model..

[42]  Vincent Le Guilloux,et al.  Visual Characterization and Diversity Quantification of Chemical Libraries: 2. Analysis and Selection of Size-Independent, Subspace-Specific Diversity Indices , 2012, J. Chem. Inf. Model..

[43]  Hua-Wei Shen,et al.  Community Structure of Complex Networks , 2013, Springer Theses.

[44]  Weihua Li,et al.  Recursive PCA for Adaptive Process Monitoring , 1999 .