An evaluation of bootstrap methods for outlier detection in least squares regression

Abstract Outlier detection is a critical part of data analysis, and the use of Studentized residuals from regression models fit using least squares is a very common approach to identifying discordant observations in linear regression problems. In this paper we propose a bootstrap approach to constructing critical points for use in outlier detection in the context of least-squares Studentized residuals, and find that this approach allows naturally for mild departures in model assumptions such as non-Normal error distributions. We illustrate our methodology through both a real data example and simulated data.

[1]  Thomas S. Ferguson,et al.  On the Rejection of Outliers , 1961 .

[2]  Norman R. Draper,et al.  Residuals and Their Variance Patterns , 1972 .

[3]  R. H. Moore,et al.  Testing for a Single Outlier in Simple Linear Regression , 1973 .

[4]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[5]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[6]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[7]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[8]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[9]  R. Snee Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1983 .

[10]  G. V. Kass,et al.  Location of Several Outliers in Multiple-Regression Data Using Elemental Sets , 1984 .

[11]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[12]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[13]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[14]  A. Madansky Identification of Outliers , 1988 .

[15]  B. Efron Jackknife‐After‐Bootstrap Standard Errors and Influence Functions , 1992 .

[16]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[17]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[18]  A. J. Lawrance,et al.  Deletion Influence and Masking in Regression , 1995 .

[19]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[20]  D. Hamilton,et al.  Regression residuals and test statistics: Assessing naive outlier deletion , 2000 .

[21]  Madeleine Walker,et al.  Masking unmasked , 2002, The Journal of audiovisual media in medicine.

[22]  Minge Xie,et al.  Bootlier-plot: Bootstrap based outlier detection plot , 2003 .

[23]  Neil C. Schwertman,et al.  A simple more general boxplot method for identifying outliers , 2004, Comput. Stat. Data Anal..

[24]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[25]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[26]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .