A practical approximation algorithm for the LMS line estimator

The problem of fitting a straight line to a finite collection of points in the plane is an important problem in statistical estimation. Robust estimators are particularly important because of their lack of sensitivity to outlying data points. The basic measure of the robustness of an estimator is its breakdown point, that is, the fraction (up to 50%) of outlying data points that can corrupt the estimator. Rousseeuw`s least median-of-squares (LMS) regression (line) estimator is among the best known 50% breakdown-point estimators. The best exact algorithms known for this problem run in O(n{sup 2}) time, where n is the number of data points. Because of this high running time, many practitioners prefer to use a simple O(n log n) Monte Carlo algorithm, which is quite efficient but provides no guarantees of accuracy (even probabilistic) unless the data set satisfies certain assumptions. In this paper, we present two algorithms in an attempt to close the gap between theory and practice. The first is a conceptually simple randomized Las Vegas approximation algorithm for LMS, which runs in O(n log n) time. However, this algorithm relies on somewhat complicated data structures to achieve its efficiency. The second is a practical randomized algorithm formore » LMS that uses only simple data structures. It can be run as either an exact or an approximation algorithm. This algorithm runs no slower than O(n{sup 2} log n) time, but we present empirical evidence that its running time on realistic data sets is much better. This algorithm provides an attractive option for practitioners, combining both the efficiency of a Monte Carlo algorithm and guarantees on the accuracy of the result.« less

[1]  Herbert Edelsbrunner,et al.  Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms , 1988, SCG '88.

[2]  Douglas M. Hawkins,et al.  The feasible set algorithm for least median of squares regression , 1993 .

[3]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[4]  H. Edelsbrunner,et al.  Computing Least Median of Squares Regression Lines and Guided Topological Sweep , 1990 .

[5]  Stephen H. Friedberg,et al.  Linear Algebra , 2018, Computational Mathematics with SageMath.

[6]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[7]  William Rucklidge,et al.  Efficient Visual Recognition Using the Hausdorff Distance , 1996, Lecture Notes in Computer Science.

[8]  Azriel Rosenfeld,et al.  Robust detection of road segments in noisy aerial images , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9]  Charles V. Stewart,et al.  MINPRAN: A New Robust Estimator for Computer Vision , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Leonidas J. Guibas,et al.  Topologically sweeping an arrangement , 1986, STOC '86.

[11]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[12]  P. Diaconis,et al.  The Subgroup Algorithm for Generating Uniform Random Variables , 1987, Probability in the Engineering and Informational Sciences.

[13]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[14]  Jirí Matousek Randomized Optimal Algorithm for Slope Selection , 1991, Inf. Process. Lett..

[15]  Andrew Stein,et al.  Robust statistics in shape fitting , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  José Julio Espina Agulló Exact algorithms for computing the least median of squares estimate in multiple linear regression , 1997 .

[17]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[18]  Caltech,et al.  The Palomar Digital Sky Survey (DPOSS) , 1998, astro-ph/9809187.

[19]  Gene H. Golub,et al.  Matrix computations , 1983 .

[20]  ScienceDirect Computational statistics & data analysis , 1983 .

[21]  J. Steele,et al.  Time- and Space-Efficient Algorithms for Least Median of Squares Regression , 1987 .

[22]  Daniel P. Huttenlocher,et al.  A multi-resolution technique for comparing images using the Hausdorff distance , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[23]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[24]  Ketan Mulmuley,et al.  Computational geometry : an introduction through randomized algorithms , 1993 .

[25]  Mia Hubert,et al.  Recent developments in PROGRESS , 1997 .

[26]  Jirí Matousek,et al.  Efficient Randomized Algorithms for the Repeated Median Line Estimator , 1998, SODA '93.

[27]  Arnold J. Stromberg,et al.  Computing the Exact Least Median of Squares Estimate and Stability Diagnostics in Multiple Linear Regression , 1993, SIAM J. Sci. Comput..

[28]  Douglas M. Hawkins,et al.  The feasible solution algorithm for least trimmed squares regression , 1994 .

[29]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[30]  Peter J. Rousseeuw A diagnostic plot for regression outliers and leverage points , 1991 .

[31]  Jirí Matousek,et al.  Efficient partition trees , 1991, SCG '91.

[32]  Clark F. Olson,et al.  An Approximation Algorithm for Least Median of Squares Regression , 1997, Inf. Process. Lett..

[33]  David M. Mount,et al.  Efficient algorithms for robust feature matching , 1999, Pattern Recognit..

[34]  Jirí Matousek Cutting hyperplane arrangements , 1991, Discret. Comput. Geom..

[35]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[36]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[37]  David M. Mount,et al.  A randomized algorithm for slope selection , 1992, Int. J. Comput. Geom. Appl..

[38]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[39]  Jirí Matousek Cutting hyperplane arrangements , 1990, SCG '90.

[40]  O. Hössjer Exact computation of the least trimmed squares estimate in simple linear regression , 1995 .

[41]  Endre Szemerédi,et al.  An Optimal-Time Algorithm for Slope Selection , 1989, SIAM J. Comput..