Diagnostic Checking in Regression Relationships

is still one of the most popular tools for data analysis despite (or due to) its simple structure. Although it is appropriate in many situations, there are many pitfalls that might affect the quality of conclusions drawn from fitted models or might even lead to uninterpretable results. Some of these pitfalls that are considered especially important in applied econometrics are heteroskedasticity or serial correlation of the error terms, structural changes in the regression coefficients, nonlinearities, functional misspecification or omitted variables. Therefore, a rich variety of diagnostic tests for these situations have been developed in the econometrics community, a collection of which has been implemented in the packages lmtest and strucchange covering the problems mentioned above. These diagnostic tests are not only useful in econometrics but also in many other fields where linear regression is used, which we will demonstrate with an application from biostatistics. As Breiman (2001) argues it is important to assess the goodness-of-fit of data models, in particular not only using omnibus tests but tests designed for a certain direction of the alternative. These diagnostic checks do not have to be seen as pure significance procedures but also as an explorative tool to extract information about the structure of the data, especially in connection with residual plots or other diagnostic plots. As Brown, Durbin, and Evans (1975) argue for the recursive CUSUM test, these procedures can “be regarded as yardsticks for the interpretation of data rather than leading to hard and fast decisions.” Moreover, we will always be able to reject the nullhypothesis provided we have enough data at hand. The question is not whether the model is wrong (it always is!) but if the irregularities are serious. The package strucchange implements a variety of procedures related to structural change of the regression coefficients and was already introduced in R news by Zeileis (2001) and described in more detail in Zeileis, Leisch, Hornik, and Kleiber (2002). Therefore, we will focus on the package lmtest in the following. Most of the tests and the datasets contained in the package are taken from the book of Kramer and Sonnberger (1986), which originally inspired us to write the package. Compared to the book, we implemented later versions of some tests and modern flexible interfaces for the procedures. Most of the tests are based on the OLS residuals of a linear model, which is specified by a formula argument. Instead of a formula a fitted model of class "lm" can also be supplied, which should work if the data are either contained in the object or still present in the workspace—however this is not encouraged. The full references for the tests can be found on the help pages of the respective function. We present applications of the tests contained in lmtest to two different data sets: the first is a macroeconomic time series from the U.S. analysed by Stock and Watson (1996) and the second is data from a study on measurments of fetal mandible length discussed by Royston and Altman (1994).