Some cautionary notes on the use of principal components regression

Abstract Many textbooks on regression analysis include the methodology of principal components regression (PCR) as a way of treating multicollinearity problems. Although we have not encountered any strong justification of the methodology, we have encountered, through carrying out the methodology in well-known data sets with severe multicollinearity, serious actual and potential pitfalls in the methodology. We address these pitfalls as cautionary notes, numerical examples that use well-known data sets. We also illustrate by theory and example that it is possible for the PCR to fail miserably in the sense that when the response variable is regressed on all of the p principal components (PCs), the first (p − 1) PCs contribute nothing toward the reduction of the residual sum of squares, yet the last PC alone (the one that is always discarded according to PCR methodology) contributes everything. We then give conditions under which the PCR totally fails in the above sense.

[1]  S. Boneh,et al.  Variable selection in regression models using principal components , 1994 .

[2]  S. Chatterjee,et al.  Regression Analysis by Example (2nd ed.). , 1992 .

[3]  George H. Dunteman,et al.  Principal Components Analysis , 1990 .

[4]  P. Jupp,et al.  Fitting Smooth Paths to Spherical Data , 1987 .

[5]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[6]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[7]  Robert L. Mason,et al.  Biased Estimation in Regression: An Evaluation Using Mean Squared Error , 1977 .

[8]  J. T. Webster,et al.  An Analytic Variable Selection Technique for Principal Component Regression , 1977 .

[9]  Douglas M. Hawkins,et al.  On the Investigation of Alternative Regressions by Principal Component Analysis , 1973 .

[10]  I. Jolliffe Discarding Variables in a Principal Component Analysis. Ii: Real Data , 1973 .

[11]  William F. Lott The optimal set of principal component restrictions on a least-squares regression , 1973 .

[12]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[13]  J. N. R. Jeffers,et al.  Two Case Studies in the Application of Principal Component Analysis , 1967 .

[14]  James W. Longley An Appraisal of Least Squares Programs for the Electronic Computer from the Point of View of the User , 1967 .

[15]  H. Hotelling The relations of the newer multivariate statistical methods to factor analysis. , 1957 .