Outlier detection and least trimmed squares approximation using semi-definite programming

Robust linear regression is one of the most popular problems in the robust statistics community. It is often conducted via least trimmed squares, which minimizes the sum of the k smallest squared residuals. Least trimmed squares has desirable properties and forms the basis on which several recent robust methods are built, but is very computationally expensive due to its combinatorial nature. It is proven that the least trimmed squares problem is equivalent to a concave minimization problem under a simple linear constraint set. The ''maximum trimmed squares'', an ''almost complementary'' problem which maximizes the sum of the q smallest squared residuals, in direct pursuit of the set of outliers rather than the set of clean points, is introduced. Maximum trimmed squares (MTS) can be formulated as a semi-definite programming problem, which can be solved efficiently in polynomial time using interior point methods. In addition, under reasonable assumptions, the maximum trimmed squares problem is guaranteed to identify outliers, no mater how extreme they are.

[1]  Frank Critchley,et al.  RelaxMCD: Smooth optimisation for the Minimum Covariance Determinant estimator , 2010, Comput. Stat. Data Anal..

[2]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[3]  Masaru Kamada,et al.  A Numerical Approach for Solving Some Convex Maximization Problems , 2006, J. Glob. Optim..

[4]  José Julio Espina Agulló New algorithms for computing the least trimmed squares regression estimator , 2001 .

[5]  Sanford Weisberg,et al.  Directions in Robust Statistics and Diagnostics , 1991 .

[6]  Frank Critchley,et al.  A relaxed approach to combinatorial problems in robustness and diagnostics , 2010, Stat. Comput..

[7]  R. Horst,et al.  Global Optimization: Deterministic Approaches , 1992 .

[8]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[9]  Dankmar Böhning,et al.  The lower bound method in probit regression , 1999 .

[10]  Michael Schyns,et al.  The case sensitivity function approach to diagnostic and robust computation: A relaxation strategy , 2004 .

[11]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[12]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[13]  Dimitris Bertsimas,et al.  Classification and Regression via Integer Optimization , 2007, Oper. Res..

[14]  George L. Nemhauser,et al.  A branch-and-cut algorithm for nonconvex quadratic programs with box constraints , 2005, Math. Program..

[15]  Todd Melander Statistics and Data Analysis , 1996 .

[16]  Anthony C. Atkinson,et al.  Simulated Annealing for the detection of Multiple Outliers using least squares and least median of squares fittin , 1991 .

[17]  GeorgiosZioutas,et al.  Deleting Outliers in Robust Regression with Mixed Integer Programming , 2005 .

[18]  Nguyen V. Thoai,et al.  Finite Exact Branch-and-Bound Algorithms for Concave Minimization over Polytopes , 2000, J. Glob. Optim..

[19]  H. Tuy Convex analysis and global optimization , 1998 .

[20]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[21]  Kim-Chuan Toh,et al.  SDPT3 — a Matlab software package for semidefinite-quadratic-linear programming, version 3.0 , 2001 .

[22]  Stephen P. Boyd,et al.  Applications of semidefinite programming , 1999 .

[23]  D. Berry,et al.  Statistics: Theory and Methods , 1990 .

[24]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[25]  G. V. Kass,et al.  Location of Several Outliers in Multiple-Regression Data Using Elemental Sets , 1984 .

[26]  Douglas M. Hawkins,et al.  The feasible solution algorithm for least trimmed squares regression , 1994 .

[27]  George L. Nemhauser,et al.  A polyhedral study of nonconvex quadratic programs with box constraints , 2005, Math. Program..

[28]  Douglas M. Hawkins,et al.  Improved Feasible Solution Algorithms for High Breakdown Estimation , 1999 .

[29]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[30]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .