K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance

We initiate the study of sparse recovery problems under the Earth-Mover Distance (EMD). Specifically, we design a distribution over m x n matrices A such that for any x, given Ax, we can recover a k-sparse approximation to x under the EMD distance. One construction yields m=O(k log (n/k)) and a 1 + ε approximation factor, which matches the best achievable bound for other error measures, such as the l1 norm. Our algorithms are obtained by exploiting novel connections to other problems and areas, such as streaming algorithms for k-median clustering and model-based compressive sensing. We also provide novel algorithms and results for the latter problems.

[1]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[2]  Justin K. Romberg,et al.  Compressive Sensing by Random Convolution , 2009, SIAM J. Imaging Sci..

[3]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[4]  Pankaj K. Agarwal,et al.  Approximation algorithms for bipartite and non-bipartite matching in the plane , 1999, SODA '99.

[5]  Piotr Indyk,et al.  Sparse recovery for Earth Mover Distance , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[7]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[8]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Graham Cormode,et al.  Combinatorial Algorithms for Compressed Sensing , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[10]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[11]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[12]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[13]  Eric Price,et al.  Efficient sketches for the set query problem , 2010, SODA '11.

[14]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[15]  Holger Rauhut,et al.  The Gelfand widths of lp-balls for 0p<=1 , 2010, J. Complex..

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Matrices , 2010, Proceedings of the IEEE.

[18]  Graham Cormode,et al.  An Improved Data Stream Summary: The Count-Min Sketch and Its Applications , 2004, LATIN.

[19]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[20]  Richard Baraniuk,et al.  Recovery of Clustered Sparse Signals from Compressive Measurements , 2009 .

[21]  I. Daubechies,et al.  Tree Approximation and Optimal Encoding , 2001 .

[22]  David P. Woodruff,et al.  Lower bounds for sparse recovery , 2010, SODA '10.

[23]  Siwei Lyu,et al.  Mercer kernels for object recognition with local features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics : A Primer , 1994 .

[25]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[26]  Christian Sohler,et al.  Coresets in dynamic geometric data streams , 2005, STOC '05.

[27]  Piotr Indyk,et al.  Combining geometry and combinatorics: A unified approach to sparse signal recovery , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[28]  Piotr Indyk,et al.  Algorithms for dynamic geometric problems over data streams , 2004, STOC '04.