Counting by Coin Tossings

This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quantitative characteristics of very large data sets. The algorithms are by nature probabilistic and based on hashing. They exploit properties of simple discrete probabilistic models and their design is tightly coupled with their analysis, itself often founded on methods from analytic combinatorics. Singularly efficient solutions have been found that defy information theoretic lower bounds applicable to deterministic algorithms. Characteristics like the total number of elements, cardinality (the number of distinct elements), frequency moments, as well as unbiased samples can be gathered with little loss of information and only a small probability of failure. The algorithms are applicable to traffic monitoring in networks, to data base query optimization, and to some of the basic tasks of data mining. They apply to massive data streams and in many cases require strictly minimal auxiliary storage.

[1]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[2]  Philippe Flajolet,et al.  Approximate counting: A detailed analysis , 1985, BIT.

[3]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[4]  Robert H. Morris,et al.  Counting large numbers of events in small registers , 1978, CACM.

[5]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[6]  Steven R. Finch,et al.  Mathematical constants , 2005, Encyclopedia of mathematics and its applications.

[7]  George E. Andrews,et al.  q-Series Arising From The Study of Random Graphs , 1997, SIAM J. Discret. Math..

[8]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[9]  M. Hofri Analysis of Algorithms: Computational Methods & Mathematical Tools , 1995 .

[10]  Philippe Flajolet,et al.  Adaptive Sampling , 1997 .

[11]  Philippe Robert,et al.  AIMD algorithms and exponential functionals , 2004 .

[12]  George Varghese,et al.  Counting the number of active flows on a high speed link , 2002, CCRV.

[13]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[14]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[15]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[16]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[17]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[18]  Philippe Biane,et al.  Poissonian Exponential Functionals, q-Series, q-Integrals, and the Moment Problem for log-Normal Distributions , 2004 .

[19]  Philippe Flajolet,et al.  Mellin Transforms and Asymptotics: Harmonic Sums , 1995, Theor. Comput. Sci..

[20]  W. Szpankowski Average Case Analysis of Algorithms on Sequences , 2001 .

[21]  Philippe Flajolet,et al.  An introduction to the analysis of algorithms , 1995 .

[22]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[23]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[24]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[25]  Marianne Durand Combinatoire analytique et algorithmique des ensembles de données. (Multivariate holonomy, applications in combinatories, and analysis of algorithms) , 2004 .

[26]  Philippe Jacquet,et al.  Analytical Depoissonization and its Applications , 1998, Theor. Comput. Sci..

[27]  Andrzej Pelc,et al.  Deterministic Rendezvous in Graphs , 2003 .

[28]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[29]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[30]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).