Approximate Greedy Clustering and Distance Selection for Graph Metrics

$\newcommand{\eps}{\varepsilon}$ In this paper, we consider two important problems defined on finite metric spaces, and provide efficient new algorithms and approximation schemes for these problems on inputs given as graph shortest path metrics or high-dimensional Euclidean metrics. The first of these problems is the greedy permutation (or farthest-first traversal) of a finite metric space: a permutation of the points of the space in which each point is as far as possible from all previous points. We describe randomized algorithms to find $(1+\eps)$-approximate greedy permutations of any graph with $n$ vertices and $m$ edges in expected time $O(\eps^{-1}(m+n)\log n\log(n/\eps))$, and to find $(1+\eps)$-approximate greedy permutations of points in high-dimensional Euclidean spaces in expected time $O(\eps^{-2} n^{1+1/(1+\eps)^2 + o(1)})$. Additionally we describe a deterministic algorithm to find exact greedy permutations of any graph with $n$ vertices and treewidth $O(1)$ in worst-case time $O(n^{3/2}\log^{O(1)} n)$. The second of the two problems we consider is distance selection: given $k \in [ \binom{n}{2} ]$, we are interested in computing the $k$th smallest distance in the given metric space. We show that for planar graph metrics one can approximate this distance, up to a constant factor, in near linear time.

[1]  Sariel Har-Peled Clustering Motion , 2004, Discret. Comput. Geom..

[2]  Greg N. Frederickson Ambivalent Data Structures for Dynamic 2-Edge-Connectivity and k Smallest Spanning Trees , 1997, SIAM J. Comput..

[3]  Ken-ichi Kawarabayashi,et al.  More Compact Oracles for Approximate Distances in Undirected Planar Graphs , 2013, SODA.

[4]  Zhigang Xiang,et al.  Color image quantization by minimizing the maximum intercluster distance , 1997, TOGS.

[5]  Philip M. Long,et al.  Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[6]  Jeff Erickson,et al.  On the relative complexities of some geometric problems , 1995, CCCG.

[7]  Daniel J. Rosenkrantz,et al.  An analysis of several heuristics for the traveling salesman problem , 2013, Fundamental Problems in Computing.

[8]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  Sihem Amer-Yahia,et al.  Diverse near neighbor problem , 2013, SoCG '13.

[10]  Satish Rao,et al.  Planar graphs, negative weight edges, shortest paths, and near linear time , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[11]  János Pach,et al.  Combinatorial Geometry , 2012 .

[12]  N. Dodgson,et al.  A new point cloud simplification algorithm , 2003 .

[13]  Pierre Bessière,et al.  The Ariadne's Clew Algorithm , 1993, J. Artif. Intell. Res..

[14]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[15]  Leonidas J. Guibas,et al.  Kinetically-aware Conformational Distances in Molecular Dynamics , 2011, CCCG.

[16]  Sergio Cabello,et al.  Algorithms for graphs of bounded treewidth via orthogonal range searching , 2009, Comput. Geom..

[17]  Edith Cohen,et al.  All-Distances Sketches , 2016, Encyclopedia of Algorithms.

[18]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[19]  Manor Mendel,et al.  Fast C-K-R Partitions of Sparse Graphs , 2008, Chic. J. Theor. Comput. Sci..

[20]  Maria-Florina Balcan,et al.  Efficient Clustering with Limited Distance Information , 2010, UAI.

[21]  Piotr Indyk,et al.  Euclidean spanners in high dimensions , 2013, SODA.

[22]  Edith Cohen,et al.  All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  Giovanni Ramponi,et al.  Design of Farthest-Point Masks for Image Halftoning , 2004, EURASIP J. Adv. Signal Process..

[24]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[25]  Robert Krauthgamer,et al.  Algorithms on negatively curved spaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[26]  Mikkel Thorup,et al.  Quick k-Median, k-Center, and Facility Location for Sparse Graphs , 2001, SIAM J. Comput..

[27]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[28]  Mikkel Thorup Compact oracles for reachability and approximate distances in planar digraphs , 2004, JACM.

[29]  Kadir Erkan,et al.  Isolating Non-predefined Sensor Faults by Using Farthest First Traversal Algorithm , 2012 .

[30]  Gregory Dudek,et al.  Autonomous Adaptive Underwater Exploration using Online Topic Modeling , 2012, ISER.

[31]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[32]  Yehoshua Y. Zeevi,et al.  The farthest point strategy for progressive image sampling , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[33]  Donald B. Johnson,et al.  Finding k-th Paths and p-Centers by Generating and Searching Good Data Structures , 1983, J. Algorithms.

[34]  T. Funkhouser,et al.  Möbius voting for surface correspondence , 2009, SIGGRAPH 2009.

[35]  Donald B. Johnson,et al.  Generalized Selection and Ranking: Sorted Matrices , 1984, SIAM J. Comput..

[36]  Satish Rao,et al.  Planar graphs, negative weight edges, shortest paths, and near linear time , 2006, J. Comput. Syst. Sci..

[37]  Nimrod Megiddo,et al.  An O(n log2 n) Algorithm for the k-th Longest Path in a Tree with Applications to Location Problems , 1981, SIAM J. Comput..