A Note on the Asymptotic Behavior of the Heights in b-Tries for b Large

We study the limiting distribution of the height in a generalized trie in which external nodes are capable to store up to $b$ items (the so called $b$-tries). We assume that such a tree is built from $n$ random strings (items) generated by an unbiased memoryless source. In this paper, we discuss the case when $b$ and $n$ are both large. We shall identify five regions of the height distribution that should be compared to three regions obtained for fixed $b$. We prove that for most $n$, the limiting distribution is concentrated at the single point $k_1=\lfloor \log_2 (n/b)\rfloor +1$ as $n,b\to \infty$. We observe that this is quite different than the height distribution for fixed $b$, in which case the limiting distribution is of an extreme value type concentrated around $(1+1/b)\log_2 n$. We derive our results by analytic methods, namely generating functions and the saddle point method. We also present some numerical verification of our results.

[1]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[2]  W. Szpankowski,et al.  Limit Laws for Heights in Generalized Tries and PATRICIA Tries , 1999 .

[3]  Wojciech Szpankowski,et al.  A suboptimal lossy data compression based on approximate pattern matching , 1997, IEEE Trans. Inf. Theory.

[4]  R. Graham,et al.  Handbook of Combinatorics , 1995 .

[5]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[6]  B. Pittel Paths in a random digital tree: limiting distributions , 1986, Advances in Applied Probability.

[7]  Mireille Régnier,et al.  Trie Partitioning Process: Limiting Distributions , 1986, CAAP.

[8]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[9]  En-Hui Yang,et al.  On the Performance of Data Compression Algorithms Based Upon String Matching , 1998, IEEE Trans. Inf. Theory.

[10]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[11]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[12]  L. Devroye A Study of Trie-Like Structures Under the Density Model , 1992 .

[13]  Mireille Régnier,et al.  On the Average Height of Trees in Digital Search and Dynamic Hashing , 1981, Inf. Process. Lett..

[14]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[15]  Hosam M. Mahmoud,et al.  Evolution of random search trees , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[16]  Wojciech Szpankowski,et al.  Height in a digital search tree and the longest phrase of the Lempel-Ziv scheme , 2000, SODA '00.