Building Multivariate Models from Compressed Data

The effect of data compression on multivariate analysis, especially on principal components analysis (PCA) based modeling, is evaluated in this paper. A comparative study between the “swinging door” and “wavelet compression” algorithms is performed in the context of multivariate data analysis. It is demonstrated that wavelet compression preserves the correlation between different variables better than swinging door compression. It is also demonstrated that the impact of compression increases as the dynamics of the processes become faster and more stochastic in nature. Instead of interpolation based reconstruction of swinging-door-compressed data and subsequent modeling, an iterative missing data technique is suggested for building a PCA model from swinging-door-compressed data. The performance of the proposed methodology is demonstrated using a simulated flow-network system and an industrial data set.

[1]  Dragana P. Brzakovic,et al.  A Practical Assessment of Process Data Compression Techniques , 1998 .

[2]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[3]  Sirish L. Shah,et al.  Model Identification and Error Covariance Matrix Estimation from Noisy Data Using PCA , 2004 .

[4]  S. Joe Qin,et al.  On‐line data compression and error analysis using wavelet technology , 2000 .

[5]  M. A. A. Shoukat Choudhury Detection and diagnosis of control loop nonlinearities, valve stiction and data compression , 2005 .

[6]  Gene H. Golub,et al.  Numerical methods for computing angles between linear subspaces , 1971, Milestones in Matrix Computation.

[7]  D. Massart,et al.  Dealing with missing data , 2001 .

[8]  Dale E. Seborg,et al.  Effect of Data Compression on Pattern Matching in Historical Data , 2005 .

[9]  J. C. Hale,et al.  Historical data recording for process computers , 1981 .

[10]  Ajit S. Bopardikar,et al.  Wavelet transforms - introduction to theory and applications , 1998 .

[11]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[12]  Nina F. Thornhill,et al.  The impact of compression on data-driven process analyses , 2004 .

[13]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[14]  R. Manne,et al.  Missing values in principal component analysis , 1998 .

[15]  B. Bakshi,et al.  Bayesian principal component analysis , 2002 .

[16]  P. A. Taylor,et al.  Missing data methods in PCA and PLS: Score calculations with incomplete observations , 1996 .

[17]  George Stephanopoulos,et al.  Compression of chemical process data by functional approximation and feature extraction , 1996 .

[18]  Merico E. Argentati,et al.  Principal Angles between Subspaces in an A-Based Scalar Product: Algorithms and Perturbation Estimates , 2001, SIAM J. Sci. Comput..

[19]  Guy A. Dumont,et al.  Paper machine data analysis and compression using wavelets , 1997 .

[20]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .