Dimensionality reduction for visualizing industrial chemical process data

Abstract This paper explores dimensionality reduction (DR) approaches for visualizing high dimensional data in chemical processes. Visualization provides powerful insight and process understanding in the industrial context, and accelerates process troubleshooting. A diverse array of existing, easy-to-use DR methods are evaluated in three case studies on large-scale industrial manufacturing plants. Supervised and unsupervised cases are presented with the objective of solving typical industrial problems related to unplanned events, plant performance improvement, and quality underperformance troubleshooting. For the unsupervised case, the evaluation aims to identify approaches that provide insight beyond those of PCA (Principal Component Analysis), and also examines quality metrics of the reduced (latent) space which characterize the degree of trust in the DR. UMAP (Uniform Manifold Approximation and Projection) outperforms other techniques, bringing new insights when comparing with other methods. For the supervised case, UMAP is combined with traditional variable selection methods, such as VIP (Variable Influence on Projection) weights from PLS-DA (Partial Least Squares Discriminant Analysis), in order to improve latent space visualization by increasing separation between classes.

[1]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[2]  Enrico Bertini,et al.  Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[3]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[4]  V. Venkatasubramanian The promise of artificial intelligence in chemical engineering: Is it here, finally? , 2018, AIChE Journal.

[5]  Zhiqiang Ge,et al.  Data Mining and Analytics in the Process Industry: The Role of Machine Learning , 2017, IEEE Access.

[6]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[7]  Si-Zhao Joe Qin,et al.  Survey on data-driven industrial process monitoring and diagnosis , 2012, Annu. Rev. Control..

[8]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[9]  Richard D. Braatz,et al.  Fault Detection and Diagnosis in Industrial Systems , 2001 .

[10]  Barbara Hammer,et al.  Data visualization by nonlinear dimensionality reduction , 2015, WIREs Data Mining Knowl. Discov..

[11]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[13]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[14]  Bo Lu,et al.  Big Data Analytics in Chemical Engineering. , 2017, Annual review of chemical and biomolecular engineering.

[15]  Kaixiang Peng,et al.  A P-t-SNE and MMEMPM based quality-related process monitoring method for a variety of hot rolling processes , 2019, Control Engineering Practice.

[16]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[17]  Jing Wang,et al.  SOM-based visualization monitoring and fault diagnosis for chemical process , 2016, 2016 Chinese Control and Decision Conference (CCDC).

[18]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[19]  Leland McInnes,et al.  Accelerated Hierarchical Density Based Clustering , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Asha Gowda Karegowda,et al.  Feature Subset Selection Problem using Wrapper Approach in Supervised Learning , 2010 .

[22]  Karel Matous,et al.  A nonlinear manifold-based reduced order model for multiscale analysis of heterogeneous hyperelastic materials , 2016, J. Comput. Phys..

[23]  Leo H. Chiang,et al.  Advances and opportunities in machine learning for process data analytics , 2019, Comput. Chem. Eng..

[24]  Sirish L. Shah,et al.  Design of visualization plots of industrial alarm and event data for enhanced alarm management , 2018 .

[25]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[26]  Donghua Zhou,et al.  Fault detection based on robust characteristic dimensionality reduction , 2019, Control Engineering Practice.

[27]  Axel Coussement,et al.  Reduced-order PCA models for chemical reacting flows , 2014 .

[28]  Garrison W. Cottrell,et al.  Non-Linear Dimensionality Reduction , 1992, NIPS.

[29]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[30]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[31]  J. Tukey,et al.  Multiple-Factor Analysis , 1947 .

[32]  Daniel A. Keim,et al.  Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[33]  Ernestina Menasalvas Ruiz,et al.  A methodology to compare Dimensionality Reduction algorithms in terms of loss of quality , 2014, Inf. Sci..

[34]  Richard D. Braatz,et al.  Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis , 2000 .

[35]  Olli Simula,et al.  Process Monitoring and Modeling Using the Self-Organizing Map , 1999, Integr. Comput. Aided Eng..

[36]  Wei Sun,et al.  Generic Process Visualization Using Parametric t-SNE , 2018 .

[37]  Tao Zeng,et al.  Fast identification of power change rate of PEM fuel cell based on data dimensionality reduction approach , 2019, International Journal of Hydrogen Energy.

[38]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[39]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[40]  Erik Frisk,et al.  Combining model-based diagnosis and data-driven anomaly classifiers for fault isolation , 2018, Control Engineering Practice.

[41]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[42]  Zhi-huan Song,et al.  Distributed PCA Model for Plant-Wide Process Monitoring , 2013 .

[43]  Jing Wang,et al.  MLLE: Modified Locally Linear Embedding Using Multiple Weights , 2006, NIPS.

[44]  Sirish L. Shah,et al.  Model Identification and Error Covariance Matrix Estimation from Noisy Data Using PCA , 2004 .

[45]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[46]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[47]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[48]  Yale Zhang,et al.  Industrial application of multivariate SPC to continuous caster start-up operations for breakout prevention , 2006 .

[49]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[50]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[51]  Marco S. Reis,et al.  Wide spectrum feature selection (WiSe) for regression model building , 2019, Comput. Chem. Eng..

[52]  Yue Liu,et al.  Materials discovery and design using machine learning , 2017 .

[53]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[54]  Barbara Hammer,et al.  Parametric nonlinear dimensionality reduction using kernel t-SNE , 2015, Neurocomputing.

[55]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[56]  Valerio Pascucci,et al.  Visualizing High-Dimensional Data: Advances in the Past Decade , 2017, IEEE Transactions on Visualization and Computer Graphics.

[57]  Yuan Luo,et al.  Recent Advances in Supervised Dimension Reduction: A Survey , 2019, Mach. Learn. Knowl. Extr..

[58]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..