Fast principal component analysis of large data sets

Abstract Principal component analysis (PCA) and principal component regression (PCR) are widespread algorithms for calibration of spectrometers and evaluation of unknown measurement spectra. In many measurement tasks, the amount of calibration data is increasing nowadays due to new devices like hyperspectral imagers. Core of PCA is the singular value decomposition (SVD) of the matrix containing the calibration spectra. SVD of large calibration sets is computational, very expensive and often gets unreasonable due to excessive calculation times. With hyperspectral imaging as application in mind, an algorithm is proposed for compressing calibration spectra based on a wavelet transformation before performing the SVD. Considering only relevant wavelet coefficients can accelerate the SVD. After determining the relevant principal components (PCs) from this shrunken calibration matrix in the wavelet domain, they are expanded again by insertion of zeros at the right positions. Denoised PCs are then obtained by the inverse wavelet transform into the wavelength domain. An additional computation speed increase is described for “landscape” matrices by transposing the matrix before performing the SVD. In the Results section, both PCA approaches are demonstrated to result in comparable PCs. This is done by means of synthetically generated spectra as well as by experimental FTIR-data. By this algorithm, the PCA of the discussed examples could be accelerated up to a factor of 52. Additionally, concentrations of synthetic spectra are evaluated by means of the PCs obtained by the different PCA algorithms. Both PC sets, the conventional and the one based on the new technique, result in equivalent concentration values.

[1]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[2]  William H. Press,et al.  Numerical recipes in C , 2002 .

[3]  Desire L. Massart,et al.  Wavelets — something for analytical chemistry? , 1997 .

[4]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[5]  William J. Egan,et al.  Measurement of Carboxyhemoglobin in Forensic Blood Samples Using UV-Visible Spectrometry and Improved Principal Component Regression , 1999 .

[6]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[9]  Desire L. Massart,et al.  Wavelet packet transform applied to a set of signals: A new approach to the best-basis selection , 1997 .

[10]  P. Geladi,et al.  Multivariate image analysis , 1996 .

[11]  S. Mallat A wavelet tour of signal processing , 1998 .

[12]  D. Massart,et al.  Application of Wavelet Packet Transform in Pattern Recognition of Near-IR Data , 1996 .

[13]  H. M. Heise,et al.  Calibration modeling by partial least-squares and principal component regression and its optimization using an improved leverage correction for prediction testing , 1990 .

[14]  R. Manne,et al.  Fast regression methods in a Lanczos (or PLS-1) basis. Theory and applications , 2000 .

[15]  Knut Kvaal,et al.  Mapping Lipid Oxidation in Chicken Meat by Multispectral Imaging of Autofluorescence , 2000 .

[16]  Desire L. Massart,et al.  Optimization of signal denoising in discrete wavelet transform , 1999 .

[17]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[18]  Desire L. Massart,et al.  Noise suppression and signal compression using the wavelet packet transform , 1997 .