The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression

Leverage values are being used in regression diagnostics as measures of influential observations in the $X$-space. Detection of high leverage values is crucial because of their responsibility for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers, etc. Much work has been done on the identification of single high leverage points and it is generally believed that the problem of detection of a single high leverage point has been largely resolved. But there is no general agreement among the statisticians about the detection of multiple high leverage points. When a group of high leverage points is present in a data set, mainly because of the masking and/or swamping effects the commonly used diagnostic methods fail to identify them correctly. On the other hand, the robust alternative methods can identify the high leverage points correctly but they have a tendency to identify too many low leverage points to be points of high leverages which is not also desired. An attempt has been made to make a compromise between these two approaches. We propose an adaptive method where the suspected high leverage points are identified by robust methods and then the low leverage points (if any) are put back into the estimation data set after diagnostic checking. The usefulness of our newly proposed method for the detection of multiple high leverage points is studied by some well-known data sets and Monte Carlo simulations.

[1]  David M. Sebert,et al.  A clustering algorithm for identifying multiple outliers in linear regression , 1998 .

[2]  Ruben H. Zamar,et al.  Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[3]  Ali S. Hadi,et al.  A new measure of overall potential influence in linear regression , 1992 .

[4]  J. A. Díaz-García,et al.  SENSITIVITY ANALYSIS IN LINEAR REGRESSION , 2022 .

[5]  Roy E. Welsch,et al.  Efficient Computing of Regression Diagnostics , 1981 .

[6]  Francisco J. Prieto,et al.  Multivariate Outlier Detection and Robust Covariance Matrix Estimation , 2001, Technometrics.

[7]  J. Simonoff,et al.  Procedures for the Identification of Multiple Outliers in Linear Models , 1993 .

[8]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[9]  S. Chatterjee Sensitivity analysis in linear regression , 1988 .

[10]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[11]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[12]  Sung-Soo Kim,et al.  Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization , 2007, Comput. Stat..

[13]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[14]  W. Krzanowski,et al.  Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model , 2008 .

[15]  E. D. Rest,et al.  Statistical Theory and Methodology in Science and Engineering , 1963 .

[16]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[17]  Brian J Gray A simple graphic for assessing influence in regression G , 1986 .

[18]  V. Yohai,et al.  The Detection of Influential Subsets in Linear Regression by Using an Influence Matrix , 1995 .

[19]  A. Hadi,et al.  BACON: blocked adaptive computationally efficient outlier nominators , 2000 .

[20]  Thomas P. Ryan,et al.  Modern Regression Methods , 1996 .

[21]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[22]  A. Imon,et al.  Deletion residuals in the detection of heterogeneity of variances in linear regression , 2009 .

[23]  R. R. Hocking,et al.  The regression dilemma , 1983 .

[24]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[25]  Paul Davies,et al.  A New Graphical Display for Locating Multiple Influential Observations, High Leverage Points and Outliers in Linear Regression , 2007 .

[26]  S. Weisberg Plots, transformations, and regression , 1985 .

[27]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[28]  A. H. M. Rahmatullah Imon,et al.  Identifying multiple influential observations in linear regression , 2005 .

[29]  Ali S. Hadi,et al.  Regression Analysis by Example: Chatterjee/Regression , 2006 .

[30]  G. V. Kass,et al.  Location of Several Outliers in Multiple-Regression Data Using Elemental Sets , 1984 .