Partial least squares, Beer's law and the net analyte signal: statistical modeling and analysis

Partial least squares (PLS) is one of the most common regression algorithms in chemistry, relating input–output samples (xi, yi) by a linear multivariate model. In this paper we analyze the PLS algorithm under a specific probabilistic model for the relation between x and y. Following Beer's law, we assume a linear mixture model in which each data sample (x, y) is a random realization from a joint probability distribution where x is the sum of k components multiplied by their respective characteristic responses, and each of these components is a random variable. We analyze PLS on this model under two idealized settings: one is the ideal case of noise‐free samples and the other is the case of an infinite number of noisy training samples. In the noise‐free case we prove that, as expected, the regression vector computed by PLS is, up to normalization, the net analyte signal. We prove that PLS computes this vector after at most k iterations, where k is the total number of components. In the case of an infinite training set corrupted by unstructured noise, we show that PLS computes a final regression vector which is not in general purely proportional to the net analyte signal vector, but has the important property of being optimal under a mean squared error of prediction criterion. This result can be viewed as an asymptotic optimality of PLS in the limit of a very large but finite training set. Copyright © 2005 John Wiley & Sons, Ltd.

[1]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[2]  Rasmus Bro,et al.  Theory of net analyte signal vectors in inverse regression , 2003 .

[3]  E. V. Thomas,et al.  COMPARISON OF MULTIVARIATE CALIBRATION METHODS FOR QUANTITATIVE SPECTRAL ANALYSIS , 1990 .

[4]  A. Höskuldsson PLS regression methods , 1988 .

[5]  R. Barnes,et al.  Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra , 1989 .

[6]  P. Geladi,et al.  Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat , 1985 .

[7]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .

[8]  Winson Taam,et al.  Detecting Spatial Effects From Factorial Experiments: An Application From Integrated-Circuit Manufacturing , 1993 .

[9]  Bruce R. Kowalski,et al.  Qualitative Information from Multivariate Calibration Models , 1990 .

[10]  Neil A. Butler,et al.  The peculiar shrinkage properties of partial least squares regression , 2000 .

[11]  I Itzkan,et al.  An enhanced algorithm for linear multivariate calibration. , 1998, Analytical chemistry.

[12]  Ronald R. Coifman,et al.  The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration , 2005 .

[13]  Svante Wold Discussion: PLS in Chemical Practice , 1993 .

[14]  Christopher D. Brown,et al.  Discordance between net analyte signal theory and practical multivariate calibration. , 2004, Analytical chemistry.

[15]  Avraham Lorber,et al.  The effect of interferences and calbiration design on accuracy: Implications for sensor and sample selection , 1988 .

[16]  Tormod Næs,et al.  A user-friendly guide to multivariate calibration and classification , 2002 .

[17]  A. Olivieri,et al.  Enhanced synchronous spectrofluorometric determination of tetracycline in blood serum by chemometric analysis. Comparison of partial least-squares and hybrid linear analysis calibrations. , 1999, Analytical chemistry.

[18]  Israel Schechter,et al.  A Calibration Method Free of Optimum Factor Number Selection for Automated Multivariate Analysis. Experimental and Theoretical Study , 1997 .

[19]  P. Garthwaite An Interpretation of Partial Least Squares , 1994 .

[20]  Mats G. Gustafsson,et al.  A Probabilistic Derivation of the Partial Least-Squares Algorithm , 2001, J. Chem. Inf. Comput. Sci..

[21]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[22]  I. Helland Partial least squares regression and statistical models , 1990 .

[23]  Alejandro C. Olivieri,et al.  Wavelength selection by net analyte signals calculated with multivariate factor-based hybrid linear analysis (HLA). A theoretical and experimental comparison with partial least-squares (PLS) , 1999 .

[24]  Maria Fernanda Pimentel,et al.  A solution to the wavelet transform optimization problem in multicomponent analysis , 2003 .

[25]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[26]  Avraham Lorber,et al.  Net analyte signal calculation in multivariate calibration , 1997 .

[27]  I. Helland Some theoretical aspects of partial least squares regression , 2001 .

[28]  Olav M. Kvalheim,et al.  Interpretation of latent-variable regression models , 1989 .

[29]  John H. Kalivas,et al.  Wavelength Selection Characterization for NIR Spectra , 1997 .

[30]  I. Helland,et al.  Comparison of Prediction Methods when Only a Few Components are Relevant , 1994 .

[31]  A. Lorber Error propagation and figures of merit for quantification by solving matrix equations , 1986 .