论文信息 - Optimal outlier removal in high-dimensional spaces

Optimal outlier removal in high-dimensional spaces

We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. As in [BFKV 99], a point x is defined to be a β-outlier if there exists some direction w in which its squared distance from the mean along w is greater than β times the average squared distance from the mean along w. Our main theorem is that for any e > 0, there exists a (1 - e) fraction of the original distribution that has no O(n/e(b + logn/e))-outliers, improving on the previous bound of O(n7b/e). This is asymptotically the best possible, as shown by a matching lower bound. The theorem is constructive, and results in a 1/1-e approximation to the following optimization problem: given a distribution µ (i.e. the ability to sample from it), and a parameter e > 0, find the minimum β for which there exists a subset of probability at least (1 - e) with no β-outliers.

Santosh S. Vempala | John Dunagan

[1] Victor J. Yohai,et al. The Behavior of the Stahel-Donoho Robust Multivariate Estimator , 1995 .

[2] Laurie Davies,et al. The identification of multiple outliers , 1993 .

[3] Santosh S. Vempala,et al. Semi-definite relaxations for minimum bandwidth and other vertex-ordering problems , 1998, STOC '98.

[4] Santosh S. Vempala,et al. Optimal outlier removal in high-dimensional , 2001, STOC '01.

[5] D. Donoho,et al. Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[6] Alan M. Frieze,et al. A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[7] Ursula Gather,et al. The maximum asymptotic bias of outlier identifiers , 1998 .

[8] Miklós Simonovits,et al. Isoperimetric problems for convex bodies and a localization lemma , 1995, Discret. Comput. Geom..

[9] M. Simonovits,et al. Random walks and an O * ( n 5 ) volume algorithm for convex bodies , 1997 .

[10] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[11] David Eppstein,et al. Approximating center points with iterative Radon points , 1996, Int. J. Comput. Geom. Appl..

[12] David Eppstein,et al. Approximating center points with iterated radon points , 1993, SCG '93.