On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation

We study classic streaming and sparse recovery problems using deterministic linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the latter also being known as l1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix A in R^{m x n} and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions: * A proof that linf/l1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is m=O(eps^{-2}*min{log n, (log n / log(1/eps))^2}). We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson-Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector. * A new lower bound for the number of linear measurements required to solve l1/l1 sparse recovery. We show Omega(k/eps^2 + klog(n/k)/eps) measurements are required to recover an x' with |x - x'|_1 <= (1+eps)|x_{tail(k)}|_1, where x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude. * A tight bound of m = Theta(eps^{-2}log(eps^2 n)) on the number of measurements required to solve deterministic norm estimation, i.e., to recover |x|_2 +/- eps|x|_1. For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of l1/l1 sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems.

[1]  K. Y. Lin,et al.  Computational Number Theory and Digital Signal Processing: Fast Algorithms and Error Control Techniques , 1994 .

[2]  Nir Ailon,et al.  An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[3]  Moni Naor,et al.  Small-Bias Probability Spaces: Efficient Constructions and Applications , 1993, SIAM J. Comput..

[4]  Sumit Ganguly,et al.  Lower Bounds on Frequency Estimation of Data Streams (Extended Abstract) , 2008, CSR.

[5]  Sumit Ganguly,et al.  CR-precis: A Deterministic Summary Structure for Update Data Streams , 2006, ESCAPE.

[6]  David P. Woodruff,et al.  Lower bounds for sparse recovery , 2010, SODA '10.

[7]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[8]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[9]  E. Kushilevitz,et al.  Communication Complexity: Basics , 1996 .

[10]  P. Gács,et al.  Algorithms , 1992 .

[11]  David Mumford,et al.  Communications on Pure and Applied Mathematics , 1989 .

[12]  Holger Rauhut,et al.  The Gelfand widths of lp-balls for 0p<=1 , 2010, J. Complex..

[13]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[14]  Noga Alon,et al.  Problems and results in extremal combinatorics--I , 2003, Discret. Math..

[15]  Rachel Ward,et al.  New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[16]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[17]  Sushil Jajodia,et al.  Detecting Novel Network Intrusions Using Bayes Estimators , 2001, SDM.

[18]  M. Rudelson,et al.  On sparse reconstruction from Fourier and Gaussian measurements , 2008 .

[19]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[20]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[21]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  D. Vernon Inform , 1995, Encyclopedia of the UN Sustainable Development Goals.

[24]  G. Winskel What Is Discrete Mathematics , 2007 .

[25]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[26]  S. Muthukrishnan,et al.  Approximation of functions over redundant dictionaries using coherence , 2003, SODA '03.

[27]  B. M. Fulk MATH , 1992 .

[28]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[29]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[30]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[31]  Holger Rauhut,et al.  The Gelfand widths of ℓp-balls for 0 , 2010, ArXiv.

[32]  Michael A. Soderstrand,et al.  Residue number system arithmetic: modern applications in digital signal processing , 1986 .

[33]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[34]  Richard C. Singleton,et al.  Nonrandom binary superimposed codes , 1964, IEEE Trans. Inf. Theory.

[35]  R. W. Watson,et al.  Self-checked computation using residue arithmetic , 1966 .

[36]  Andrej Yu. Garnaev,et al.  On widths of the Euclidean Ball , 1984 .

[37]  Sumit Ganguly,et al.  Deterministically Estimating Data Stream Frequencies , 2009, COCOA.

[38]  Joachim von zur Gathen,et al.  Modern Computer Algebra , 1998 .

[39]  David B. Skillicorn,et al.  Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004 , 2004, SDM.

[40]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[41]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[42]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[43]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[44]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[45]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[46]  David P. Woodruff,et al.  (1 + eps)-Approximate Sparse Recovery , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[47]  D. Sivakumar,et al.  Algorithmic derandomization via complexity theory , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[48]  Amnon Ta-Shma,et al.  Constructing Small-Bias Sets from Algebraic-Geometric Codes , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[49]  J. M. BoardmanAbstract,et al.  Contemporary Mathematics , 2007 .

[50]  George G. Lorentz,et al.  Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[51]  Noga Alon,et al.  Simple construction of almost k-wise independent random variables , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[52]  R. Vershynin,et al.  One sketch for all: fast algorithms for compressed sensing , 2007, STOC '07.

[53]  Noga Alon,et al.  Perturbed Identity Matrices Have High Rank: Proof and Applications , 2009, Combinatorics, Probability and Computing.

[54]  Anna C. Gilbert,et al.  QuickSAND: Quick Summary and Analysis of Network Data , 2001 .

[55]  Andrew G. Glen,et al.  APPL , 2001 .