Comments on the relationship between principal components analysis and weighted linear regression for bivariate data sets

Regression and principal components analysis (PCA) are two of the most widely used techniques in chemometrics. In this paper, these methods are compared by considering their application to linear, two-dimensional data sets with a zero intercept. The need for accommodating measurement errors with these methods is addressed and various techniques to accomplish this are considered. Seven methods are examined: ordinary least squares (OLS), weighted least squares (WLS), the effective variance method (EVM), multiply weighted regression (MWR), unweighted PCA (UPCA), and two forms of weighted PCA. Additionally, five error structures in x and y are considered: homoscedastic equal, homoscedastic unequal, proportional equal, proportional unequal, and random. It is shown that for certain error structures, several of the methods are mathematically equivalent. Furthermore, it is demonstrated that all of the methods can be unified under the principle of maximum likelihood estimation, embodied in the general case by MWR. Extensive simulations show that MWR produces the most reliable parameter estimates in terms of bias and mean-squared error. Finally, implications for modeling in higher dimensions are considered.

[1]  J. Orear LEAST SQUARES WHEN BOTH VARIABLES HAVE UNCERTAINTIES , 1982 .

[2]  Sabine Van Huffel,et al.  Total least squares problem - computational aspects and analysis , 1991, Frontiers in applied mathematics.

[3]  P. Paatero,et al.  Analysis of daily precipitation data by positive matrix factorization , 1994 .

[4]  Matthew Lybanon,et al.  A better least‐squares method when both variables have uncertainties , 1984 .

[5]  L. G. Blackwood Factor Analysis in Chemistry (2nd Ed.) , 1994 .

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Comment on ‘‘Least squares when both variables have uncertainties’’ , 1984 .

[8]  D. York Least-squares fitting of a straight line. , 1966 .

[9]  R. Cochran,et al.  Statistically weighted principal component analysis of rapid scanning wavelength kinetics experiments , 1977 .

[10]  F. X. Rius,et al.  Univariate regression models with errors in both axes , 1995 .

[11]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[12]  P. Paatero,et al.  Analysis of different modes of factor analysis as least squares fit problems , 1993 .

[13]  Edmund R. Malinowski,et al.  Factor Analysis in Chemistry , 1980 .

[14]  J. Williamson,et al.  Least-squares fitting of a straight line , 1968 .

[15]  J. S. Alper,et al.  Biases in summary statistics of slopes and intercepts in linear regression with errors in both variables. , 1995, Talanta.

[16]  Erik Nielsen,et al.  Statistics for analytical chemistry , 1987 .

[17]  V. Simeon,et al.  Weighted analysis of principal components: Two approximations to statistical weights , 1992 .

[18]  S. Zamir,et al.  Lower Rank Approximation of Matrices by Least Squares With Any Choice of Weights , 1979 .