A comparative study on detection of influential observations in linear regression

A large number of statistics are used in the literature to detect outliers and influential observations in the linear regression model. In this paper comparison studies have been made for determining a statistic which performs better than the other. This includes: (i) a detailed simulation study, and (ii) analyses of several data sets studied by different authors. Different choices of the design matrix of regression model are considered. Design A studies the performance of the various statistics for detecting the scale shift type outliers, and designs B and C provide information on the performance of the statistics for identifying the influential observations. We have used cutoff points using the exact distributions and Bonferroni's inequality for each statistic. The results show that the studentized residual which is used for detection of mean shift outliers is appropriate for detection of scale shift outliers also, and the Welsch's statistic and the Cook's distance are appropriate for detection of influential observations.

[1]  W. R. Buckland,et al.  Statistical Theory and Methodology in Science and Engineering. , 1960 .

[2]  L. N. Balaam,et al.  Statistical Theory and Methodology in Science and Engineering , 1966 .

[3]  M. R. Mickey,et al.  Note on the use of stepwise regression in detecting outliers. , 1967, Computers and biomedical research, an international journal.

[4]  H. Ahrens,et al.  Brownlee, K. A.: Statistical Theory and Methodology in Science and Engineering. John Wiley & Sons, New York 1965, 590 S., 70 Abb., Tafelanhang , 1968 .

[5]  J. Ellenberg The Joint Distribution of the Standardized Least Squares Residuals from a General Linear Regression , 1973 .

[6]  John Aitchison,et al.  Statistical Prediction Analysis , 1975 .

[7]  E. Kuh,et al.  Linear Regression Diagnostics , 1977 .

[8]  W. Federer Some Remarks on Statistical Education , 1978 .

[9]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[10]  S. Weisberg,et al.  Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression , 1980 .

[11]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[12]  D. Gibbons A Simulation Study of Some Ridge Estimators , 1981 .

[13]  A. C. Atkinson,et al.  Two graphical displays for outlying and influential observations in regression , 1981 .

[14]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[15]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[16]  Sanford Weisberg,et al.  A note on an alternative outlier model , 1982 .

[17]  R. J. Beckman,et al.  [Outlier..........s]: Response , 1983 .

[18]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[19]  Y. Tse,et al.  Outlier detection in linear models: a comparative study in simple linear regression , 1986 .

[20]  [Influential Observations, High Leverage Points, and Outliers in Linear Regression]: Comment , 1986 .

[21]  Y. Tse,et al.  An Empirical Comparison of Some Statistics for Identifying Outliers and Influential Observations in Linear Regression Models , 1987 .

[22]  D. Naik Detection of outliers in the multivariate linear regression model , 1989 .

[23]  Dayanand N. Naik,et al.  Detection of influential observations in multivariate regression , 1989 .

[24]  Detection of Outliers and Influential Observations in Regression Models , 1989 .