Chapter 5 Data Reduction Using Principal Components Analysis

Publisher Summary The chapter explores the process of principal components analysis (PCA) that is used for data reduction. PCA is probably the multivariate statistical technique most widely used by chemometricians today. It is a technique in which a set of correlated variables are transformed into a set of uncorrelated variables (principal components) such that the first few components explain most of the variation in the data. PCA is applied to high dimensional data sets to identify/display their variation structure, for sample classification, for outlier detection and for data reduction. PCA also forms the basis for the SIMCA classification technique and the partial least squares (PLS) regression technique evolved from the NIPALS algorithm for performing PCA. All real data contains experimental/random noise; PCA extracts some of this error which is usually represented by the principal components with smallest size or variance; removal of these components is therefore one form of data reduction. The chapter focuses on those methods based on knowledge of the experimental error in the data and those requiring no knowledge of the experimental error in the data. This chapter also discusses and compares the different variable reduction methods based on the PC model fitted to a data set. The chapter presents the nuclear magnetic resonance (NMR) spectra peak heights of these mixtures. The data set is a theoretical one which contains no blending or NMR spectroscopy measurement errors.

[1]  M. Bartlett TESTS OF SIGNIFICANCE IN FACTOR ANALYSIS , 1950 .

[2]  D. W. Osten,et al.  Selection of optimal regression models via cross‐validation , 1988 .

[3]  Edmund R. Malinowski,et al.  Theory of error in factor analysis , 1977 .

[4]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[5]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[6]  J. Scrivens,et al.  Isolation of component spectra in the analysis of mixtures by mass spectrometry and 13C nuclear magnetic resonance spectroscopy: The utility of abstract factor analysis , 1987 .

[7]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[8]  P. Geladi Notes on the history and nature of partial least squares (PLS) modelling , 1988 .

[9]  W. Krzanowski Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components , 1987 .

[10]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[11]  John M. Deane,et al.  Testing for redundancy in product quality control test criteria: An application to aviation turbine fuel , 1989 .

[12]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[13]  Edmund R. Malinowski,et al.  Determination of the number of factors and the experimental error in a data matrix , 1977 .

[14]  N. Campbell,et al.  Variable selection techniques in discriminant analysis: II. Allocation , 1982 .

[15]  S. Wold,et al.  SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy , 1977 .

[16]  N. Campbell,et al.  Variable selection techniques in discriminant analysis: I. Description , 1982 .

[17]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[18]  Edmund R. Malinowski,et al.  Theory of the distribution of error eigenvalues resulting from principal component analysis with applications to spectroscopic data , 1987 .

[19]  W. Krzanowski,et al.  Cross-Validatory Choice of the Number of Components From a Principal Component Analysis , 1982 .

[20]  Wojtek J. Krzanowski,et al.  Cross-Validation in Principal Component Analysis , 1987 .

[21]  Edmund R. Malinowski,et al.  Statistical F‐tests for abstract factor analysis and target testing , 1989 .

[22]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[23]  J. N. R. Jeffers,et al.  Two Case Studies in the Application of Principal Component Analysis , 1967 .

[24]  R. Sibson Studies in the Robustness of Multidimensional Scaling: Procrustes Statistics , 1978 .

[25]  John C. Gower,et al.  Statistical methods of comparing different multivariate analyses of the same data , 1971 .

[26]  I. Jolliffe Discarding Variables in a Principal Component Analysis. Ii: Real Data , 1973 .

[27]  T. J. Klingen,et al.  Correlation of retention volumes of substitutued carboranes with molecular properties in high pressure liquid chromatography using factor analysis , 1974 .