How to Count Quickly and Accurately: A Unified Analysis of Probabilistic Counting and Other Related Problems

We consider a class of probabilistic counting algorithms parameter-ized by an integer d≥ 0 that estimate the number of elements N in a large set. Our algorithms generalize an idea of Flajolet and Martin who limited themselves to the case d=0. As noted by Brassard and Bratley “it is far from obvious how to carry out a more precise analysis of the unbiased estimate of N ...”. We present a novel and complete analysis of these new counting algorithms that — to the best of our knowledge — cannot be obtained by an extension of the analysis by Flajolet and Martin. We present results concerning the average value, the variance and the limiting generating function of an estimate of N. Moreover, our novel approach is not limited to probabilistic counting algorithms, and it can be applied in the investigation of several other “splitting algorithms” such as selecting the loser within a group of people, estimating the number of ques-tions necessary to identify the number of distinct objects, searching algorithms based on digital tries, approximate counting, electing d finalists in a contest (cf. polling system), and so forth.

[1]  Helmut Prodinger,et al.  Approximate counting: an alternative approach , 1991, RAIRO Theor. Informatics Appl..

[2]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[3]  W. Szpankowski Solution of a linear recurrence equation arising in the analysis of some algorithms , 1987 .

[4]  Wojciech Szpankowski,et al.  Patricia tries again revisited , 1990, JACM.

[5]  P. Flajolet,et al.  Some Uses of the Mellin Integral Transform in the Analysis of Algorithms , 1985 .

[6]  W. Szpankowski,et al.  Yet Another Application of a Binomial Recurrence , 1988 .

[7]  R. Fisher The Advanced Theory of Statistics , 1943, Nature.

[8]  Philippe Flajolet,et al.  Estimating the multiplicities of conflicts to speed their resolution in multiple access channels , 1987, JACM.

[9]  Henry C. Thacher,et al.  Applied and Computational Complex Analysis. , 1988 .

[10]  Ulrich Schmid,et al.  The Average CRI-Length of a Tree Collision Resolution Algorithm in Presence of Multiplicity-Dependent Capture Effects , 1992, ICALP.

[11]  Helmut Prodinger,et al.  How to select a loser , 1993, Discret. Math..

[12]  Boris G. Pittel,et al.  How many random questions are necessary to identify n distinct objects? , 1990, J. Comb. Theory, Ser. A.

[13]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[14]  Philippe Flajolet,et al.  Singularity Analysis of Generating Functions , 1990, SIAM J. Discret. Math..

[15]  Robert H. Morris,et al.  Counting large numbers of events in small registers , 1978, CACM.

[16]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[17]  Helmut Prodinger,et al.  On the analysis of probabilistic counting , 1990 .

[18]  Philippe Jacquet,et al.  Ultimate Characterizations of the Burst Response of an Interval Searching Algorithm: A Study of a Functional Equation , 1989, SIAM J. Comput..

[19]  Gilles Brassard,et al.  Algorithmics - theory and practice , 1988 .

[20]  Philippe Flajolet,et al.  Generalized Digital Trees and Their Difference-Differential Equations , 1992, Random Struct. Algorithms.

[21]  Philippe Jacquet,et al.  Limiting Distribution for the Depth in Patricia Tries , 1993, SIAM J. Discret. Math..