Profiles of Tries

Tries (from retrieval) are one of the most popular data structures on words. They are pertinent to the (internal) structure of stored words and several splitting procedures used in diverse contexts. The profile of a trie is a parameter that represents the number of nodes (either internal or external) with the same distance from the root. It is a function of the number of strings stored in a trie and the distance from the root. Several, if not all, trie parameters such as height, size, depth, shortest path, and fill-up level can be uniformly analyzed through the (external and internal) profiles. Although profiles represent one of the most fundamental parameters of tries, they have hardly been studied in the past. The analysis of profiles is surprisingly arduous, but once it is carried out it reveals unusually intriguing and interesting behavior. We present a detailed study of the distribution of the profiles in a trie built over random strings generated by a memoryless source. We first derive recurrences satisfied by the expected profiles and solve them asymptotically for all possible ranges of the distance from the root. It appears that profiles of tries exhibit several fascinating phenomena. When moving from the root to the leaves of a trie, the growth of the expected profiles varies. Near the root, the external profiles tend to zero at an exponential rate, and then the rate gradually rises to being logarithmic; the external profiles then abruptly tend to infinity, first logarithmically and then polynomially; they then tend polynomially to zero again. Furthermore, the expected profiles of asymmetric tries are oscillating in a range where profiles grow polynomially, while symmetric tries are nonoscillating, in contrast to most shape parameters of random tries studied previously. Such a periodic behavior for asymmetric tries implies that the depth satisfies a central limit theorem but not a local limit theorem of the usual form. Also the widest levels in symmetric tries contain a linear number of nodes, differing from the order $n/\sqrt{\log n}$ for asymmetric tries, $n$ being the size of the trees. Finally, it is observed that profiles satisfy central limit theorems when the variance goes unbounded, while near the height they are distributed according to Poisson laws. As a consequence of these results we find typical behaviors of the height, shortest path, fill-up level, and depth. These results are derived here by methods of analytic algorithmics such as generating functions, Mellin transform, Poissonization and de-Poissonization, the saddle-point method, singularity analysis, and uniform asymptotic analysis.

[1]  Mireille Régnier,et al.  New results on the size of tries , 1989, IEEE Trans. Inf. Theory.

[2]  S. Rachev,et al.  Probability metrics and recursive algorithms , 1995, Advances in Applied Probability.

[3]  Hosam M. Mahmoud,et al.  Distribution of inter-node distances in digital trees , 2005 .

[4]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[5]  F. Olver Asymptotics and Special Functions , 1974 .

[6]  Jong-Deok Choi,et al.  Efficient and precise datarace detection for multithreaded object-oriented programs , 2002, PLDI '02.

[7]  Luc Devroye,et al.  A note on the average depth of tries , 1982, Computing.

[8]  Michel Nguyên-Thê Distribution de valuations sur les arbres. , 2004 .

[9]  Mireille Régnier,et al.  Normal Limiting Distribution of the Size of Tries , 1987, Performance.

[10]  Werner Schachinger,et al.  On the Variance of a Class of Inductive Valuations of Data Structures for Digital Search , 1995, Theor. Comput. Sci..

[11]  Stefan Nilsson,et al.  An Experimental Study of Compression Methods for Dynamic Tries , 2002, Algorithmica.

[12]  Philippe Flajolet,et al.  A Branching Process Arising in Dynamic Hashing, Trie Searching and Polynomial Factorization , 1982, ICALP.

[13]  W. Nobauer B-TRIES: A PARADIGM FOR THE USE OF NUMBERTHEORETIC METHODS IN THE ANALYSIS OF ALGORITHMS , 2003 .

[14]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  Wojciech Szpankowski,et al.  On the number of full levels in tries , 2004, Random Struct. Algorithms.

[16]  Markus E. Nebel,et al.  The stack-size of tries: a combinatorial study , 2002, Theor. Comput. Sci..

[17]  L. Devroye A Study of Trie-Like Structures Under the Density Model , 1992 .

[18]  Wojciech Szpankowski,et al.  On the height of digital trees and related problems , 1991, Algorithmica.

[19]  Roderick Wong,et al.  Asymptotic approximations of integrals , 1989, Classics in applied mathematics.

[20]  Hsien-Kuei Hwang,et al.  Profiles of random trees: correlation and width of random recursive trees and binary search trees , 2005, Advances in Applied Probability.

[21]  Luc Devroye,et al.  Universal Asymptotics for Random Tries and PATRICIA Trees , 2005, Algorithmica.

[22]  Wojciech Szpankowski,et al.  A Note on the Asymptotic Behavior of the Heights in b-Tries for b Large , 2000, Electron. J. Comb..

[23]  Michael Drmota,et al.  Bimodality and Phase Transitions in the Profile Variance of Random Binary Search Trees , 2005, SIAM J. Discret. Math..

[24]  Wojciech Szpankowski,et al.  Analysis of Randomized Selection Algorithm Motivated by the LZ'77 Scheme , 2004, ALENEX/ANALC.

[25]  Wojciech Szpankowski,et al.  Some Results on V-ary Asymmetric Tries , 1988, J. Algorithms.

[26]  Philippe Jacquet,et al.  Autocorrelation on Words and Its Applications - Analysis of Suffix Trees by String-Ruler Approach , 1994, J. Comb. Theory A.

[27]  Hsien-Kuei Hwang,et al.  WIDTH AND MODE OF THE PROFILE FOR SOME RANDOM TREES OF LOGARITHMIC HEIGHT , 2006, math/0607119.

[28]  Mireille Régnier,et al.  Trie Partitioning Process: Limiting Distributions , 1986, CAAP.

[29]  Hsien-Kuei Hwang,et al.  Profiles of Random Trees: Limit Theorems for Random Recursive Trees and Binary Search Trees , 2006, Algorithmica.

[30]  H. Prodinger,et al.  ON SOME APPLICATIONS OF FORMULAE OF RAMANUJAN IN THE ANALYSIS OF ALGORITHMS , 1991 .

[31]  B. Berndt Ramanujan’s Notebooks: Part V , 1997 .

[32]  M. Lucertini,et al.  Analysis and design of algorithms for combinatorial problems , 1985 .

[33]  Philippe Flajolet,et al.  An introduction to the analysis of algorithms , 1995 .

[34]  Werner Schachinger,et al.  Concentration of Size and Path Length of Tries , 2004, Combinatorics, Probability and Computing.

[35]  Markus E. Nebel,et al.  The Stack-Size of Combinatorial Tries Revisited , 2002, Discret. Math. Theor. Comput. Sci..

[36]  Wojciech Szpankowski,et al.  Analysis of the multiplicity matching parameter in suffix trees , 2005 .

[37]  Philippe Jacquet,et al.  Analytical Depoissonization and its Applications , 1998, Theor. Comput. Sci..

[38]  D. Aldous,et al.  A diffusion limit for a class of randomly-growing binary trees , 1988 .

[39]  Charles Knessl,et al.  A Note on the Asymptotic Behavior of the Depth of Tries , 1998, Algorithmica.

[40]  Pierre Nicod,et al.  Average profiles, from tries to suffix-trees , 2005 .

[41]  Philippe Flajolet,et al.  Mellin Transforms and Asymptotics: Harmonic Sums , 1995, Theor. Comput. Sci..

[42]  Svante Janson,et al.  Analysis of an Asymmetric Leader Election Algorithm , 1997, Electron. J. Comb..

[43]  Luc Devroye,et al.  On the Horton-Strahler Number for Random Tries , 1996, RAIRO Theor. Informatics Appl..

[44]  W. Szpankowski Average Case Analysis of Algorithms on Sequences , 2001 .

[45]  Helmut Prodinger,et al.  On the variance of the external path length in a symmetric digital trie , 1989, Discret. Appl. Math..

[46]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[47]  Hsien-Kuei Hwang,et al.  Profiles of random trees: plane-oriented recursive trees (Extended Abstract) † , 2007 .

[48]  Jérémie Bourdon,et al.  On the Stack-Size of General Tries , 2001, RAIRO Theor. Informatics Appl..

[49]  Dan Gusfield,et al.  Algorithms on strings , 1997 .

[50]  Guy Louchard Trie Size in a Dynamic List Structure , 1993, TAPSOFT.

[51]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[52]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[53]  Hsien-Kuei Hwang,et al.  Asymptotic expansions for the Stirling numbers of the first kind , 1995 .

[54]  Wojciech Szpankowski Average Complexity of Additive Properties for Multiway Tries: A Unified Approach (Extended Abstract) , 1987, TAPSOFT, Vol.1.

[55]  Wojciech Szpankowski,et al.  Profile of Tries , 2008, LATIN.

[56]  Helmut Prodinger,et al.  Analysis of a splitting process arising in probabilistic counting and other related algorithms , 1996 .

[57]  P. Flajolet,et al.  Algebraic Methods for Trie Statistics , 1985 .

[58]  Philippe Jacquet,et al.  Analysis of digital tries with Markovian dependency , 1991, IEEE Trans. Inf. Theory.

[59]  M. Drmota,et al.  The Profile of Binary Search Trees , 2001 .

[60]  Mark Daniel Ward,et al.  Analysis of the average depth in a suffix tree under a Markov model , 2005 .

[61]  L. Rüschendorf,et al.  A general limit theorem for recursive algorithms and combinatorial structures , 2004 .

[62]  Helmut Prodinger,et al.  How to select a loser , 1993, Discret. Math..

[63]  Wojciech Szpankowski,et al.  Limit laws for the height in PATRICIA tries , 2002, J. Algorithms.

[64]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[65]  Philippe Jacquet,et al.  Average Profile of the Lempel-Ziv Parsing Scheme for a Markovian Source , 2001, Algorithmica.

[66]  H. Mahmoud,et al.  The oscillatory distribution of distances in random tries , 2005, math/0505259.

[67]  Helmut Prodinger,et al.  Some Further Results on Digital Search Trees , 1986, ICALP.

[68]  Douglas Quadling,et al.  Ramanujan's Notebooks , 1986 .

[69]  Yuriy A. Reznik Some results on tries with adaptive branching , 2002, Theor. Comput. Sci..

[70]  L. Devroye Laws of large numbers and tail inequalities for random tries and PATRICIA trees , 2002 .

[71]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[72]  Hosam M. Mahmoud,et al.  Evolution of random search trees , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[73]  Wojciech Szpankowski,et al.  Towards a complete characterization of tries , 2005, SODA '05.

[74]  Werner Schachinger,et al.  Asymptotic normality of recursive algorithms via martingale difference arrays , 2001, Discret. Math. Theor. Comput. Sci..

[75]  Philippe Flajolet,et al.  On the performance evaluation of extendible hashing and trie searching , 1983, Acta Informatica.

[76]  Luc Devroye Universal Limit Laws for Depths in Random Trees , 1998, SIAM J. Comput..

[77]  Philippe Flajolet,et al.  Mellin Transforms and Asymptotics: Finite Differences and Rice's Integrals , 1995, Theor. Comput. Sci..

[78]  Markus E. Nebel,et al.  On the Horton-Strahler number for combinatorial tries , 2000, RAIRO Theor. Informatics Appl..

[79]  Helmut Prodinger,et al.  Further Results on Digital Search Trees , 1988, Theor. Comput. Sci..

[80]  Wojciech Szpankowski,et al.  On the distribution for the duration of a randomized leader election algorithm , 1996 .

[81]  Luc Devroye A probabilistic analysis of the height of tries and of the complexity of triesort , 2004, Acta Informatica.

[82]  V. Srinivasan,et al.  Fast address lookups using controlled prefix expansion , 1999, TOCS.

[83]  B. Pittel Paths in a random digital tree: limiting distributions , 1986, Advances in Applied Probability.

[84]  HwangHsien-Kuei Profiles of random trees: Plane-oriented recursive trees , 2007 .

[85]  Philippe Flajolet,et al.  Dynamical Sources in Information Theory : A General Analysis of Trie Structures , 1999 .