Near-Optimal (Euclidean) Metric Compression

The metric sketching problem is defined as follows. Given a metric on n points, and ϵ > 0, we wish to produce a small size data structure (sketch) that, given any pair of point indices, recovers the distance between the points up to a 1 + ϵ distortion. In this paper we consider metrics induced by l2 and l1 norms whose spread (the ratio of the diameter to the closest pair distance) is bounded by Φ > 0. A well-known dimensionality reduction theorem due to Johnson and Lindenstrauss yields a sketch of size O(ϵ−2 log(Φn)n log n), i.e., O(ϵ−2 log(Φn)n log n) bits per point. We show that this bound is not optimal, and can be substantially improved to O(ϵ−2 log(1/ϵ) · log n + log log Φ) bits per point. Furthermore, we show that our bound is tight up to a factor of log(1/ϵ). We also consider sketching of general metrics and provide a sketch of size O(n log(1/ϵ) + log log Φ) bits per point, which we show is optimal.

[1]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[2]  Jose Augusto Ramos Soares,et al.  Graph Spanners: a Survey , 1992 .

[3]  Noga Alon,et al.  Optimal compression of approximate Euclidean distances , 2016 .

[4]  Noga Alon,et al.  Optimal Compression of Approximate Inner Products and Dimension Reduction , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[6]  Mikkel Thorup,et al.  Approximate distance oracles , 2005, J. ACM.

[7]  Hanan Samet,et al.  An Overview of Quadtrees, Octrees, and Related Hierarchical Data Structures , 1988 .

[8]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  David M. Mount,et al.  A Succinct, Dynamic Data Structure for Proximity Queries on Point Sets , 2014, CCCG.

[10]  David P. Woodruff,et al.  Beating the Direct Sum Theorem in Communication Complexity with Implications for Sketching , 2013, SODA.

[11]  Kasper Green Larsen,et al.  Optimality of the Johnson-Lindenstrauss Lemma , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[12]  Nieves R. Brisaboa,et al.  Compact Querieable Representations of Raster Data , 2013, SPIRE.

[13]  Benoît Hudson,et al.  Succinct Representation of Well-Spaced Point Clouds , 2009, ArXiv.

[14]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[15]  David P. Woodruff,et al.  Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error , 2011, TALG.

[16]  Travis Gagie,et al.  Faster Compressed Quadtrees , 2014, 2015 Data Compression Conference.

[17]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[18]  Noga Alon,et al.  Problems and results in extremal combinatorics--I , 2003, Discret. Math..

[19]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[20]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.