Kernel Partial Least Squares for the Identification of Mixture Content from TeraHertz Spectra

This paper introduces kernel partial least squares (K-PLS) for the identification of mixture content from terahertz spectra. Kernel partial least squares is a nonlinear extension of the partial least squares (PLS) method, commonly used in chemometrics. K-PLS and PLS are considered superior to peak matching methods for mixture spectra of multiple compounds because it avoids having to address the problem of overlapping peaks explicitly. Terahertz (THz) radiation is capable of transmitting easily through most dielectric materials and is used as a new tool to collect the original spectral readings from transmission, diffusion and reflection. A multi-output kernel partial least squares method is presented to model mixture composition based on pure substance training patterns, under the assumption of linear spectral mixture behavior. Preprocessing consists of a wavelet transform of the THz spectra and an independent component analysis (ICA) transform. Preliminary results show that the ICA+K-PLS approach is able to classify pure spectra accurately and allows for an accurate estimate of the composition from THz mixed spectra even where there are severe overlapped peaks in these spectra.

[1]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[2]  Miguel de la Guardia,et al.  Determination of the energetic value of fruit and milk-based beverages through partial-least-squares attenuated total reflectance-Fourier transform infrared spectrometry , 2005 .

[3]  S. Wold,et al.  The kernel algorithm for PLS , 1993 .

[4]  Kristin P. Bennett,et al.  An Optimization Perspective on Kernel Partial Least Squares Regression , 2003 .

[5]  M. Embrechts,et al.  Direct kernel least-squares support vector machines with heuristic regularization , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[6]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[7]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[8]  Hai Du,et al.  Automation of gas chromatography instruments. Part I. Automated peak identification in the chromatograms of standard test mixtures , 1997 .

[9]  Xicheng Zhang,et al.  Materials for terahertz science and technology , 2002, Nature materials.

[10]  Martin J. Stillman,et al.  Automation of gas chromatography instruments. Part II. A knowledge-based system for performance assessment , 1997 .

[11]  Claude Pouchan,et al.  Determination of the composition of a mixture of gases by infrared analysis and chemometric methods , 2005 .

[12]  Svante Wold,et al.  Personal memories of the early PLS development , 2001 .

[13]  Zhang Xi,et al.  Materials for terahertz science and technology , 2003 .

[14]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[15]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[16]  O. Andersen,et al.  Chromatographic preprocessing of GC-MS data for analysis of complex chemical mixtures. , 2005, Journal of chromatography. A.

[17]  A study on the applicability on multicomponent calibration methods in chemometrics , 1999 .

[18]  J. M. Martínez Vidal,et al.  Multicomponent determination of pesticides in vegetables by gas chromatography with mass spectrometric detection and multivariate calibration. , 2003, Talanta.

[19]  Boleslaw K. Szymanski,et al.  Introduction to Scientific Data Mining: Direct Kernel Methods and Applications , 2004, Computationally Intelligent Hybrid Systems.

[20]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[21]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[22]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[23]  Shouxin Ren,et al.  Simultaneous spectrophotometric determination of manganese, zinc and cobalt by kernel partial least–squares method , 1998, The Journal of automatic chemistry.