论文信息 - A unifying framework for ℓ0-sampling algorithms - 字舞流文

A unifying framework for ℓ0-sampling algorithms

The problem of building an ℓ0-sampler is to sample near-uniformly from the support set of a dynamic multiset. This problem has a variety of applications within data analysis, computational geometry and graph algorithms. In this paper, we abstract a set of steps for building an ℓ0-sampler, based on sampling, recovery and selection. We analyze the implementation of an ℓ0-sampler within this framework, and show how prior constructions of ℓ0-samplers can all be expressed in terms of these steps. Our experimental contribution is to provide a first detailed study of the accuracy and computational cost of ℓ0-samplers.

Graham Cormode | Donatella Firmani | Graham Cormode | D. Firmani

[1] Mikkel Thorup,et al. The power of simple tabulation hashing , 2010, STOC.

[2] Sumit Ganguly,et al. Counting distinct items over update streams , 2005, Theor. Comput. Sci..

[3] Divesh Srivastava,et al. Holistic UDAFs at streaming speeds , 2004, SIGMOD '04.

[4] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[5] David P. Woodruff,et al. An optimal algorithm for the distinct elements problem , 2010, PODS '10.

[6] Sudipto Guha,et al. Analyzing graph structure via linear measurements , 2012, SODA.

[7] Dimitris Achlioptas,et al. Database-friendly random projections , 2001, PODS.

[8] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[9] Graham Cormode,et al. On Unifying the Space of ℓ0-Sampling Algorithms , 2013, ALENEX.

[10] Aravind Srinivasan,et al. Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[11] Rob Pike,et al. Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[12] Eric Price,et al. Efficient sketches for the set query problem , 2010, SODA '11.

[13] Themis Palpanas,et al. Frequent items in streaming data: An experimental evaluation of the state-of-the-art , 2009, Data Knowl. Eng..

[14] Anupam Gupta,et al. An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[15] Piotr Indyk,et al. Sampling in dynamic data streams and applications , 2005, Int. J. Comput. Geom. Appl..

[16] David Eppstein,et al. Space-Efficient Straggler Identification in Round-Trip Data Streams Via Newton's Identities and Invertible Bloom Filters , 2007, WADS.

[17] Amr El Abbadi,et al. Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic , 2008, EDBT '08.

[18] Jeffrey D. Ullman,et al. Principles of Database Systems , 1980 .

[19] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[20] Peter J. Haas,et al. Distinct-value synopses for multiset operations , 2009, CACM.

[21] Moses Charikar,et al. Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[22] Graham Cormode,et al. Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling , 2005, VLDB.

[23] Hossein Jowhari,et al. Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[24] Piotr Indyk,et al. A small approximately min-wise independent family of hash functions , 1999, SODA '99.

[25] R. Vershynin,et al. One sketch for all: fast algorithms for compressed sensing , 2007, STOC '07.

[26] Noam Nisan,et al. Pseudorandom generators for space-bounded computations , 1990, STOC '90.

[27] Ely Porat,et al. Feasible Sampling of Non-strict Turnstile Data Streams , 2012, ArXiv.

[28] David P. Woodruff,et al. 1-pass relative-error Lp-sampling with applications , 2010, SODA '10.

[29] Peter J. Haas,et al. Synopses for Massive Data , 2012 .