A Statistical View of Some Chemometrics Regression Tools

Chemometrics is a field of chemistry that studies the application of statistical methods to chemical data analysis. In addition to borrowing many techniques from the statistics and engineering literatures, chemometrics itself has given rise to several new data-analytical methods. This article examines two methods commonly used in chemometrics for predictive modeling—partial least squares and principal components regression—from a statistical perspective. The goal is to try to understand their apparent successes and in what situations they can be expected to work well and to compare them with other statistical methods intended for those situations. These methods include ordinary least squares, variable subset selection, and ridge regression.

[1]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[2]  George E. P. Box,et al.  The Bayesian estimation of common parameters from several responses , 1965 .

[3]  J. W. Gorman,et al.  Selection of Variables for Fitting Equations to Data , 1966 .

[4]  D. Lindley The Choice of Variables in Multiple Regression , 1968 .

[5]  Donald W. Marquaridt Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation , 1970 .

[6]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[7]  Douglas M. Hawkins,et al.  On the Investigation of Alternative Regressions by Principal Component Analysis , 1973 .

[8]  C. L. Mallows Some comments on C_p , 1973 .

[9]  J. T. Webster,et al.  Latent Root Regression Analysis , 1974 .

[10]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[11]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[12]  A Note on a Power Generalization of Ridge Regression , 1975 .

[13]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[16]  Gary Smith,et al.  A Critique of Some Ridge Regression Methods , 1980 .

[17]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[18]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[19]  Tormod Næs,et al.  Comparison of prediction methods for multicollinear data , 1985 .

[20]  L. E. Wangen,et al.  A theoretical foundation for the PLS algorithm , 1987 .

[21]  I. E. Frank Intermediate least squares regression method , 1987 .

[22]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[23]  I. Helland ON THE STRUCTURE OF PARTIAL LEAST SQUARES REGRESSION , 1988 .

[24]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[25]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .