Building a robust linear model with forward selection and stepwise procedures

Abstract Classical step-by-step algorithms, such as forward selection (FS) and stepwise (SW) methods, are computationally suitable, but yield poor results when the data contain outliers and other contaminations. Robust model selection procedures, on the other hand, are not computationally efficient or scalable to large dimensions, because they require the fitting of a large number of submodels. Robust and computationally efficient versions of FS and SW are proposed. Since FS and SW can be expressed in terms of sample correlations, simple robustifications are obtained by replacing these correlations by their robust counterparts. A pairwise approach is used to construct the robust correlation matrix—not only because of its computational advantages over the d-dimensional approach, but also because the pairwise approach is more consistent with the idea of step-by-step algorithms. The proposed robust methods have much better performance compared to standard FS and SW. Also, they are computationally very suitable and scalable to large high-dimensional data sets.

[1]  S. Weisberg,et al.  Applied Linear Regression (2nd ed.). , 1986 .

[2]  E. Ronchetti Robust model selection in regression , 1985 .

[3]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[4]  W. Mendenhall,et al.  A Second Course in Statistics: Regression Analysis , 1996 .

[5]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[6]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[7]  Elvezio Ronchetti,et al.  A Robust Version of Mallows's C P , 1994 .

[8]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[9]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[10]  Ruben H. Zamar,et al.  Scalable robust covariance and correlation estimates for data mining , 2002, KDD.

[11]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[12]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[13]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[14]  Richard M. Huggins,et al.  Variables Selection using the Wald Test and a Robust Cp , 1996 .

[15]  Erricos John Kontoghiorghes,et al.  A branch and bound algorithm for computing the best subset regression models , 2002 .

[16]  S. Weisberg Applied Linear Regression , 1981 .

[17]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[18]  Roy E. Welsch,et al.  Algorithms for Robust Model Selection in Linear Regression , 2004 .

[19]  Elvezio Ronchetti,et al.  Robust Linear Model Selection by Cross-Validation , 1997 .

[20]  Stefan Van Aelst,et al.  Theory and applications of recent robust methods , 2004 .