Efficient Computing of Regression Diagnostics

Abstract Multiple regression diagnostic methods have recently been developed to help data analysts identify failures of data to adhere to the assumptions that customarily accompany regression models. However, the mathematical development of regression diagnostics has not generally led to efficient computing formulas. Conflicting terminology and the use of closely related but subtly different statistics has caused confusion. This article attempts to make regression diagnostics more readily available to those who compute regressions with packaged statistics programs. We review regression diagnostic methodology, highlighting ambiguities of terminology and relationships among similar methods. We present new formulas for efficient computing of regression diagnostics. Finally, we offer specific advice on obtaining regression diagnostics from existing statistics programs, with examples drawn from Minitab and SAS.

[1]  M. Ezekiel A Method of Handling Curvilinear Correlation for Any Number of Variables , 1924 .

[2]  H. Hartley Studentization and Large-Sample Theory , 1938 .

[3]  Maurice G. Kendall The advanced theory of statistics , 1958 .

[4]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[5]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[6]  Cuthbert Daniel,et al.  Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[7]  W. A. Larsen,et al.  The Use of Partial Residual Plots in Regression Analysis , 1972 .

[8]  G. Stewart Introduction to matrix computations , 1973 .

[9]  F. S. Wood The Use of Individual Effects and Residuals in Fitting Equations to Data , 1973 .

[10]  E. Kuh,et al.  Linear Regression Diagnostics , 1977 .

[11]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[12]  Frederick Mosteller,et al.  Data Analysis and Regression , 1978 .

[13]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[14]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[15]  H. V. Henderson,et al.  Building Multiple Regression Models Interactively , 1981 .

[16]  David C. Hoaglin,et al.  Applications, basics, and computing of exploratory data analysis , 1983 .

[17]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[18]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[19]  A. Dempster,et al.  New Tools for Residual Analysis , 1981 .