Influential Observations, High Leverage Points, and Outliers in Linear Regression

A bewilderingly large number of statistical quantities have been proposed to study outliers and influence of individual observations in regression analysis. In this article we describe the inter-relationships which exist among the proposed measures. An examination of these relationships leads us to conclude that only three of these measures along with some graphical displays can provide an analyst a complete picture of outliers (major discrepant points) and points which excessively influence the fitted regression equation. Illustrative examples based on real data are presented.

[1]  M. Ezekiel A Method of Handling Curvilinear Correlation for Any Number of Variables , 1924 .

[2]  M. R. Mickey,et al.  Note on the use of stepwise regression in detecting outliers. , 1967, Computers and biomedical research, an international journal.

[3]  F. Hampel Contributions to the theory of robust estimation , 1968 .

[4]  Cuthbert Daniel,et al.  Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[5]  W. A. Larsen,et al.  The Use of Partial Residual Plots in Regression Analysis , 1972 .

[6]  G. Stewart Introduction to matrix computations , 1973 .

[7]  J. Ellenberg The Joint Distribution of the Standardized Least Squares Residuals from a General Linear Regression , 1973 .

[8]  F. S. Wood The Use of Individual Effects and Residuals in Fitting Equations to Data , 1973 .

[9]  H. Trussell,et al.  The Distribution of an Arbitrary Studentized Residual and the Effects of Updating in Multiple Regression , 1974 .

[10]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[11]  Rupert G. Miller An Unbalanced Jackknife , 1974 .

[12]  E. Kuh,et al.  Linear Regression Diagnostics , 1977 .

[13]  Frederick Mosteller,et al.  Data Analysis and Regression , 1978 .

[14]  George A. F. Seber,et al.  Linear regression analysis , 1977 .

[15]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[16]  D. E. Coleman Finding Leverage Groups , 1977 .

[17]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[18]  D. F. Andrews,et al.  Finding the Outliers that Matter , 1978 .

[19]  S. Weisberg,et al.  Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression , 1980 .

[20]  Roy E. Welsch,et al.  Efficient Computing of Regression Diagnostics , 1981 .

[21]  H. V. Henderson,et al.  Building Multiple Regression Models Interactively , 1981 .

[22]  A. C. Atkinson,et al.  Two graphical displays for outlying and influential observations in regression , 1981 .

[23]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[24]  A. Dempster,et al.  New Tools for Residual Analysis , 1981 .

[25]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[26]  R. Welsch INFLUENCE FUNCTIONS AND REGRESSION DIAGNOSTICS , 1982 .

[27]  A. Hadi K-CLUSTERING AND THE DETECTION OF INFLUENTIAL SUBSETS , 1985 .

[28]  Colin L. Mallows,et al.  Augmented partial residuals , 1986 .