Multicollinearity: Diagnosing its Presence and Assessing the Potential Damage it Causes Least Squares Estimation

This paper suggests and examines a straightforward diagnostic test procedure that 1) provides numerical indexes whose magnitudes signify the presence of one or more near dependencies among columns of a data matrix X, and 2) provides a means for determining, within the linear regression model, the extent to which each such near dependency is degrading the least- squares estimation of each regression coefficient. In most instances this latter information also enables the investigator to determine specifically which columns of the data matrix are involved in each near dependency. The diagnostic test is based on an interrelation between two analytic devices, the singular-value decomposition (closely related to eigensystems) and a matching regression-variance decomposition. Both these devices are developed in full. The test is successfully given empirical content through a set of experiments that examine its behavior when applied to several different series of data matrices having one or more known near dependencies that are weak to begin with and are made to became systematically more nearly perfectly collinear. The general diagnostic properties of the test that result from these experiments and the steps required to carry out the test are summarized, and then exemplified by application to real economic data.

[1]  E. Kuh,et al.  Linear Regression Diagnostics , 1977 .