Spurious regressions in econometrics

It is very common to see reported in applied econometric literature time series regression equations with an apparently high degree of fit, as measured by the coefficient of multiple correlation R2 or the corrected coefficient R2, but with an extremely low value for the Durbin-Watson statistic. We find it very curious that whereas virtually every textbook on econometric methodology contains explicit warnings of the dangers of autocorrelated errors, this phenomenon crops up so frequently in well-respected applied work. Numerous examples could be cited, but doubtless the reader has met sufficient cases to accept our point. It would, for example, be easy to quote published equations for which R2 = 0.997 and the Durbin-Watson statistic (d) is 0.53. The most extreme example we have met is an equation for which R2 = 0.99 and d = 0.093. I-Iowever, we shall suggest that cases with much less extreme values may well be entirely spurious. The recent experience of one of us [see Box and Newbold (1971)] has indicated just how easily one can be led to produce a spurious model if sufficient care is not taken over an appropriate formulation for the autocorrelation structure of the errors from the regression equation. We felt, then, that we should undertake a more detailed enquiry seeking to determine what, if anything, could be inferred from those regression equations having the properties just described. There are, in fact, as is well-known, three major consequences of autocorrelated errors in regression analysis : (i) Estimates of the regression coefficients are inefficient. (ii) Forecasts based on the regression equations are sub-optimal. (iii) The usual significance tests on the coefficients are invalid.