Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem

The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in ℝd, incomplete data objects correspond to affine subspaces (lines or Δ-flats). With this motivation we study the problem of finding the minimum intersection radiusr(ℒ) of a set of lines or Δ-flats ℒ: the least r such that there is a ball of radius r intersecting every flat in ℒ. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher-dimensional flats, primarily because “distances” between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly’s theorem. This “intrinsic-dimension” Helly theorem states: for any family ℒ of Δ-dimensional convex sets in a Hilbert space, there exist Δ+2 sets ℒ′⊆ℒ such that r(ℒ)≤2r(ℒ′). Based upon this we present an algorithm that computes a (1+ε)-core set ℒ′⊆ℒ, |ℒ′|=O(Δ4/ε), such that the ball centered at a point c with radius (1+ε)r(ℒ′) intersects every element of ℒ. The running time of the algorithm is O(nΔ+1dpoly (Δ/ε)). For the case of lines or line segments (Δ=1), the (expected) running time of the algorithm can be improved to O(ndpoly (1/ε)). We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space.

[1]  E. Helly Über Mengen konvexer Körper mit gemeinschaftlichen Punkte. , 1923 .

[2]  Kenneth L. Clarkson,et al.  Smaller core-sets for balls , 2003, SODA '03.

[3]  Piyush Kumar,et al.  Approximate Minimum Volume Enclosing Ellipsoids Using Core Sets , 2003 .

[4]  Pankaj K. Agarwal,et al.  Approximation algorithms for projective clustering , 2000, SODA '00.

[5]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[6]  Pankaj K. Agarwal,et al.  A (1+)-approximation algorithm for 2-line-center , 2003, Comput. Geom..

[7]  Pankaj K. Agarwal,et al.  Approximation Algorithms for k-Line Center , 2002, ESA.

[8]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[9]  Sariel Har-Peled,et al.  Shape fitting with outliers , 2003, SCG '03.

[10]  Piyush Kumar,et al.  Minimum-Volume Enclosing Ellipsoids and Core Sets , 2005 .

[11]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[12]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[13]  Joseph S. B. Mitchell,et al.  Comuting Core-Sets and Approximate Smallest Enclosing HyperSpheres in High Dimensions , 2003, ALENEX.

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[16]  Joseph S. B. Mitchell,et al.  Approximate minimum enclosing balls in high dimensions using core-sets , 2003, ACM J. Exp. Algorithmics.

[17]  Sariel Har-Peled,et al.  Projective clustering in high dimensions using core-sets , 2002, SCG '02.

[18]  Micha Sharir,et al.  A subexponential bound for linear programming , 1992, SCG '92.