Metric embeddings with outliers

We initiate the study of metric embeddings with outliers. Given some finite metric space we wish to remove a small set of points and to find either an isometric or a low-distortion embedding of the remaining points into some host metric space. This is a natural problem that captures scenarios where a small fraction of points in the input corresponds to noise. We present polynomial-time approximation algorithms for computing outlier embeddings into Euclidean space, trees, and ultrametrics. In the case of isometric embeddings the objective is to minimize the number of outliers, while in the case of non-isometries we have a bi-criteria optimization problem where the goal is to minimize both the number of outliers and the distortion. We complement our approximation algorithms with NP-hardness results for these problems. We conclude with a brief experimental evaluation of our non-isometric outlier embedding on synthetic and real-world data sets.

[1]  Karl Menger,et al.  New Foundation of Euclidean Geometry , 1931 .

[2]  M J Sippl,et al.  Cayley-Menger coordinates. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Mihai Badoiu,et al.  Approximation algorithms for low-distortion embeddings into low-dimensional spaces , 2005, SODA '05.

[4]  Klaus Jansen,et al.  Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , 2006, Lecture Notes in Computer Science.

[5]  Jeff Edmonds,et al.  Inapproximability for planar embedding problems , 2010, SODA '10.

[6]  Jon M. Kleinberg,et al.  Metric embeddings with relaxed guarantees , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[7]  Leonard M. Blumenthal,et al.  Theory and applications of distance geometry , 1954 .

[8]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[9]  Yuval Rabani,et al.  Low distortion maps between point sets , 2004, STOC '04.

[10]  J. Matousek,et al.  Inapproximability for metric embeddings into Rd , 2010 .

[11]  Christos H. Papadimitriou,et al.  The complexity of low-distortion embeddings between point sets , 2005, SODA '05.

[12]  Subhash Khot,et al.  Vertex cover might be hard to approximate to within 2-/spl epsiv/ , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[13]  Dana Ron,et al.  Testing metric properties , 2001, STOC '01.

[14]  Michael R. Fellows,et al.  Distortion is Fixed Parameter Tractable , 2009, TOCT.

[15]  R. Ravi,et al.  Approximation Algorithms for Minimizing Average Distortion , 2005, Theory of Computing Systems.

[16]  Noga Alon,et al.  Ordinal embeddings of minimum relaxation: general properties, trees, and ultrametrics , 2005, SODA '05.

[17]  R. Ravi,et al.  Approximation Algorithms for Minimizing Average Distortion , 2004, STACS.

[18]  Subhash Khot,et al.  On the power of unique 2-prover 1-round games , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[19]  Feodor F. Dragan,et al.  Constant Approximation Algorithms for Embedding Graph Metrics into Trees and Outerplanar Graphs , 2010, APPROX-RANDOM.

[20]  Michael R. Fellows,et al.  Parameterized Low-distortion Embeddings - Graph metrics into lines and trees , 2008, ArXiv.

[21]  Piotr Indyk,et al.  Approximation algorithms for embedding general metrics into trees , 2007, SODA '07.

[22]  Piotr Indyk,et al.  Algorithmic applications of low-distortion geometric embeddings , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[23]  Jirí Matousek,et al.  Low-Distortion Embeddings of Finite Metric Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[24]  Piotr Indyk,et al.  Embedding ultrametrics into low-dimensional spaces , 2006, SCG '06.

[25]  A. O. Houcine On hyperbolic groups , 2006 .

[26]  Krzysztof Onak,et al.  Fat Polygonal Partitions with Applications to Visualization and Embeddings , 2010, J. Comput. Geom..

[27]  Piotr Indyk,et al.  Low-distortion embeddings of general metrics into the line , 2005, STOC '05.

[28]  Jon M. Kleinberg,et al.  Triangulation and Embedding Using Small Sets of Beacons , 2004, FOCS.

[29]  Alexander Hall,et al.  Approximating the Distortion , 2005, APPROX-RANDOM.

[30]  Feodor F. Dragan,et al.  Constant Approximation Algorithms for Embedding Graph Metrics into Trees and Outerplanar Graphs , 2010, APPROX-RANDOM.

[31]  Mikkel Thorup,et al.  On the approximability of numerical taxonomy (fitting distances by tree metrics) , 1996, SODA '96.

[32]  Feodor F. Dragan,et al.  Diameters, centers, and approximating trees of delta-hyperbolicgeodesic spaces and graphs , 2008, SCG '08.

[33]  P. Alestalo,et al.  Isometric approximation , 2001 .

[34]  Nir Ailon,et al.  Fitting tree metrics: Hierarchical clustering and phylogeny , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[35]  Mihai Badoiu,et al.  Approximation algorithm for embedding metrics into a two-dimensional space , 2003, SODA '03.

[36]  Michael Dinitz,et al.  Spanners with Slack , 2006, ESA.

[37]  Amir Nayyeri,et al.  Reality Distortion: Exact and Approximate Algorithms for Embedding into the Line , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[38]  Ittai Abraham,et al.  Embedding metrics into ultrametrics and graphs into spanning trees with constant average distortion , 2007, SODA '07.

[39]  Assaf Naor,et al.  Ramsey partitions and proximity data structures , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[40]  Christian Sohler,et al.  Streaming Embeddings with Slack , 2009, WADS.

[41]  Subhash Khot,et al.  Hardness of Embedding Metric Spaces of Equal Size , 2007, APPROX-RANDOM.