Limit Laws for Heights in Generalized Tries and PATRICIA Tries

Wojciech Szpankowskit Department of Computer Science Purdue University W. LafayeUe, IN 47907 UB.A. spa@cs.purdue.edu We consider digital trees such as (generalized) tries and PATRICIA tries, built from n random strings generated by an unbiased memoryless source (i.e., all symbols are equally likely). We study limit laws of the height which is defined as the longest path in such trees. It turns out that this height also represents the number of random questions required to recognize n distinct objects. We shall identiry three natural regions of the height distribution~. For tries, in the region where most of the probability mas~ is concentrated, the asymptotic distribution is of extreme value type (Le., double exponential distribution). Surprisinglyenough, the height of the PATIUCIA trie behaves quite differently in this region; It exhibits an exponential of a Gaussian distribution (with an oscillating tcnn) around the most probable value k l = llog2 n + /21og2 n ~ ~J+l. In fact, the asymptotic distribution of PATIUCIA height concentrates on one or two points. For most n all the ma."iS is concentrated at kI , however, there exist subsequences of n such that the mass is on the two points hI 1 and hI, or kI and hi + 1. We derive these results by a combination of analytic methods such as generating functions, Mellin transform, the saddle point method and ideas of applied mathematics such as linearization, asymptotic matching and the WKB method. We present also some numerical verification of our results.

[1]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[2]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[5]  S. Orszag,et al.  Advanced Mathematical Methods For Scientists And Engineers , 1979 .

[6]  Andrew Chi-Chih Yao,et al.  A Note on the Analysis of Extendible Hashing , 1980, Inf. Process. Lett..

[7]  Mireille Régnier,et al.  On the Average Height of Trees in Digital Search and Dynamic Hashing , 1981, Inf. Process. Lett..

[8]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[9]  Mireille Régnier,et al.  Trie Partitioning Process: Limiting Distributions , 1986, CAAP.

[10]  B. Pittel Paths in a random digital tree: limiting distributions , 1986, Advances in Applied Probability.

[11]  C. Bender,et al.  Matched Asymptotic Expansions: Ideas and Techniques , 1988 .

[12]  Boris G. Pittel,et al.  How many random questions are necessary to identify n distinct objects? , 1990, J. Comb. Theory, Ser. A.

[13]  Wojciech Szpankowski,et al.  Patricia tries again revisited , 1990, JACM.

[14]  Andrzej Ehrenfeucht,et al.  A Pseudorandom Sequence-How Random Is It? , 1992 .

[15]  Hosam M. Mahmoud,et al.  Evolution of random search trees , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[16]  L. Devroye A Study of Trie-Like Structures Under the Density Model , 1992 .

[17]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[18]  Philippe Jacquet,et al.  Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees , 1995, Theor. Comput. Sci..

[19]  Philippe Flajolet,et al.  Mellin Transforms and Asymptotics: Harmonic Sums , 1995, Theor. Comput. Sci..

[20]  R. Graham,et al.  Handbook of Combinatorics , 1995 .

[21]  Wojciech Szpankowski,et al.  On the distribution for the duration of a randomized leader election algorithm , 1996 .

[22]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[23]  Patrick Solé,et al.  Codes, Lattices, and Steiner Systems , 1997, Electron. J. Comb..

[24]  E. Rodney Canfield,et al.  From recursions to asymptotics: on Szekeres' formula for the number of partitions , 1996, Electron. J. Comb..

[25]  Alan M. Frieze,et al.  Greedy Algorithms for the Shortest Common Superstring That Are Asymptotically Optimal , 1998, Algorithmica.

[26]  Wojciech Szpankowski,et al.  Quicksort Algorithm Again Revisited , 1999, Discret. Math. Theor. Comput. Sci..