Effect of dataset size on modeling and monitoring of chemical processes

Abstract Multivariate data analysis is a powerful tool for process monitoring and data analysis. The theoretical methodology of real-time multivariate data analysis has been studied in the last decade. However, the effect of dataset size on modeling structure and fault detection ability has not been reported yet. In this paper, requirements for a minimum dataset for multivariate data analysis modeling are studied, and a practical approach is provided to evaluate the modeling structure. A method based on statistical index g2 and cross-validation is proposed to determine a minimum dataset size of a valid model for statistical process monitoring. The proposed method was built on the linear PLS model and elaborated by case studies using both batch and continuous processes. This paper provides theoretical development of multivariate data analysis and demonstrates its application in chemical processes.

[1]  P. Kočovský,et al.  The Effect of Sample Size on the Stability of Principal Components Analysis of Truss‐Based Fish Morphometrics , 2009 .

[2]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[3]  Thomas J. McAvoy A methodology for screening level control structures in plantwide control systems , 1998 .

[4]  Wojtek J. Krzanowski,et al.  Cross-Validation in Principal Component Analysis , 1987 .

[5]  C. Yoo,et al.  Nonlinear process monitoring using kernel principal component analysis , 2004 .

[6]  John F. MacGregor,et al.  Process monitoring and diagnosis by multiblock PLS methods , 1994 .

[7]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[8]  Zhi-huan Song,et al.  Process Monitoring Based on Independent Component Analysis - Principal Component Analysis ( ICA - PCA ) and Similarity Factors , 2007 .

[9]  Johan Trygg,et al.  The PLS method -- partial least squares projections to latent structures -- and its applications in industrial RDP (research, development, and production) , 2004 .

[10]  M. Miller,et al.  Sample Size Requirements for Structural Equation Models , 2013, Educational and psychological measurement.

[11]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[12]  W. Krzanowski,et al.  Cross-Validatory Choice of the Number of Components From a Principal Component Analysis , 1982 .

[13]  Jesús Picó,et al.  Online monitoring of batch processes using multi-phase principal component analysis , 2006 .

[14]  Kiti Alii Leonard,et al.  Sample Size and Subject to Item Ratio in Principal Components Analysis and Exploratory Factor Analysis , 2010 .

[15]  Manabu Kano,et al.  A new multivariate statistical process monitoring method using principal component analysis , 2001 .

[16]  S. Qin,et al.  Multiway Gaussian Mixture Model Based Multiphase Batch Process Monitoring , 2009 .

[17]  Yingwei Zhang,et al.  Enhanced statistical analysis of nonlinear processes using KPCA, KICA and SVM , 2009 .

[18]  S. Wold,et al.  Multi‐way principal components‐and PLS‐analysis , 1987 .

[19]  Andreas I. Nicolaou,et al.  Sample size requirements in structural equation models under standard conditions , 2013, Int. J. Account. Inf. Syst..

[20]  H. Abdi,et al.  Principal component analysis , 2010 .

[21]  Erik Johansson,et al.  Multivariate process and quality monitoring applied to an electrolysis process: Part I. Process supervision with multivariate control charts , 1998 .

[22]  Theodora Kourti,et al.  Process analysis, monitoring and diagnosis, using multivariate projection methods , 1995 .

[23]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[24]  Jin Cao,et al.  PCA-based fault diagnosis in the presence of control and dynamics , 2004 .

[25]  W F Velicer,et al.  An Empirical Comparison Of The Similarity Of Principal Component, Image, And Factor Patterns. , 1977, Multivariate behavioral research.

[26]  Srinivas Karra,et al.  Multi-Scale Modeling of Heterogeneities in Mammalian Cell Culture Processes , 2010 .

[27]  John F. MacGregor,et al.  Multivariate SPC charts for monitoring batch processes , 1995 .