Benchmark testing of algorithms for very robust regression: FS, LMS and LTS

The methods of very robust regression resist up to 50% of outliers. The algorithms for very robust regression rely on selecting numerous subsamples of the data. New algorithms for LMS and LTS estimators that have increased computational efficiency due to improved combinatorial sampling are proposed. These and other publicly available algorithms are compared for outlier detection. Timings and estimator quality are also considered. An algorithm using the forward search (FS) has the best properties for both size and power of the outlier tests.

[1]  Donald E. Knuth,et al.  Generating all combinations and partitions , 2008 .

[2]  Anthony C. Atkinson,et al.  The forward search: theory and data analysis , 2010 .

[3]  M. Kendall Theoretical Statistics , 1956, Nature.

[4]  Luis Angel García-Escudero,et al.  Computational Statistics and Data Analysis Robust Clusterwise Linear Regression through Trimming , 2022 .

[5]  Perrotta Domenico,et al.  Fitting Mixtures of Regression Lines with the Forward Search , 2008 .

[6]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming) , 2005 .

[7]  Anthony C. Atkinson,et al.  Exploratory tools for clustering multivariate data , 2007, Comput. Stat. Data Anal..

[8]  G. Willems,et al.  Small sample corrections for LTS and MCD , 2002 .

[9]  T. Banerjee Exploring Multivariate Data With the Forward Search , 2006 .

[10]  F Torti,et al.  Advances in the forward search: methodological and applied contributions , 2010 .

[11]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[12]  A. Atkinson,et al.  Finding an unknown number of multivariate outliers , 2009 .

[13]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[14]  Robin Nunkesser,et al.  An evolutionary algorithm for robust regression , 2010, Comput. Stat. Data Anal..

[15]  Anthony C. Atkinson,et al.  Fast calibrations of the forward search for testing multiple outliers in regression , 2007, Adv. Data Anal. Classif..

[16]  Anthony C. Atkinson,et al.  Calibrated Very Robust Regression , 2011 .

[17]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[18]  Anthony C. Atkinson,et al.  Exploring Multivariate Data with the Forward Search , 2004 .

[19]  Mia Hubert,et al.  MATLAB library LIBRA , 2010 .

[20]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[21]  William G. Cochran,et al.  Sampling Techniques, 3rd Edition , 1963 .

[22]  G. M. Tallis Elliptical and Radial Truncation in Normal Populations , 1963 .

[23]  Salvador Flores On the efficient computation of robust regression estimators , 2010, Comput. Stat. Data Anal..

[24]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[25]  David L. Hicks,et al.  Mining Massive Data Sets for Security , 2008 .

[26]  Dianne P. O'Leary,et al.  Fast robust regression algorithms for problems with Toeplitz structure , 2007, Comput. Stat. Data Anal..

[27]  V. Yohai,et al.  High Breakdown-Point Estimates of Regression by Means of the Minimization of an Efficient Scale , 1988 .

[28]  Lei M. Li,et al.  An algorithm for computing exact least-trimmed squares estimate of simple linear regression with constraints , 2004, Comput. Stat. Data Anal..

[29]  D. Berry,et al.  Statistics: Theory and Methods , 1990 .

[30]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[31]  P. Rousseeuw,et al.  A class of high-breakdown scale estimators based on subranges , 1992 .

[32]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[33]  Anna Gavling,et al.  The ART at , 2008 .

[34]  Anthony C. Atkinson,et al.  Robust Diagnostic Regression Analysis , 2000 .

[35]  S. Weisberg Plots, transformations, and regression , 1985 .

[36]  Carolin Strobl,et al.  (Psycho-)analysis of benchmark experiments: A formal framework for investigating the relationship between data sets and learning algorithms , 2014, Comput. Stat. Data Anal..