An Analytic Variable Selection Technique for Principal Component Regression

SUMMARY This paper presents an analytic technique for deleting predictor variables from a linear regression model when principal components of X'X are removed to adjust for multicollinearities in the data. The technique can be adapted to commonly used variable selection procedures such as backward elimination to eliminate redundant predictor variables without appreciably increasing the residual sum of squares. An analysis of the pitprop data of Jeffers (1967) is performed to illustrate the methods proposed in the paper. THE use of principal component procedures in either multivariate analysis or multiple linear regression generally results in a reduction of the rank of the variable space. A shortcoming often mentioned, however, is that there is no corresponding reduction in the number of original variables which must be measured. Draper (1964) became well aware of this problem when he attempted to eliminate redundant quality tests on reels of paper. Although Draper concludes that the principal components do not aid in deciding which properties of the paper to test, Jeffers (1965) argues they do, in fact, provide such information. Jolliffe (1972) discusses eight methods of reducing the number of variables in multivariate problems, four of which utilize principal components. In a second paper, Jolliffe (1973) applies five of the eight techniques to real data, including a multiple linear regression analysis. This latter example concerns the pitprop data of Jeffers (1967), in which he used principal com- ponents to provide insight into which physical properties should be used to investigate the compressive strength of the props. The purpose of this paper is to present another method of reducing the number of indepen- dent variables in a principal component regression analysis. The procedure first deletes principal components associated with small latent roots of X'X and then incorporates an analog of the backward elimination procedure (e.g. Draper and Smith, 1966, Chapter 6) to eliminate the independent variables. This furnishes an analytic procedure for deleting vari- ables, when using a principal component analysis, which is based upon minimal increases in residual sums of squares.