Universal Asymptotics for Random Tries and PATRICIA Trees

AbstractWe consider random tries and random patricia trees constructed from n independent strings of symbols drawn from any distribution on any discrete space. We show that many parameters Zn of these random structures are universally stable in the sense that Zn/E{Zn} tends to one probability. This occurs, for example, when Zn is the height, the size, the depth of the last node added, the number of nodes at a given depth (also called the profile), the search time for a partial match, the stack size, or the number of nodes with k children. These properties are valid without any conditions on the string distributions.

[1]  Luc Devroye,et al.  Expected worst-case partial match in random quadtries , 2004, Discret. Appl. Math..

[2]  Helmut Prodinger,et al.  On the Balance Property of Patricia Tries: External Path Length Viewpoint , 1989, Theor. Comput. Sci..

[3]  Philippe Flajolet,et al.  A Branching Process Arising in Dynamic Hashing, Trie Searching and Polynomial Factorization , 1982, ICALP.

[4]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[5]  D. Aldous,et al.  A diffusion limit for a class of randomly-growing binary trees , 1988 .

[6]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[7]  L. Devroye Laws of large numbers and tail inequalities for random tries and PATRICIA trees , 2002 .

[8]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[9]  Philippe Flajolet,et al.  The analysis of hybrid trie structures , 1998, SODA '98.

[10]  M. Ledoux,et al.  Isoperimetry and Gaussian analysis , 1996 .

[11]  Philippe Flajolet,et al.  On the performance evaluation of extendible hashing and trie searching , 1983, Acta Informatica.

[12]  Haim Mendelson,et al.  Analysis of Extendible Hashing , 1982, IEEE Transactions on Software Engineering.

[13]  S. Boucheron,et al.  Concentration inequalities using the entropy method , 2003 .

[14]  M. Talagrand Isoperimetry and Integrability of the Sum of Independent Banach-Space Valued Random Variables , 1989 .

[15]  L. Devroye A Study of Trie-Like Structures Under the Density Model , 1992 .

[16]  Luc Devroye A probabilistic analysis of the height of tries and of the complexity of triesort , 2004, Acta Informatica.

[17]  Luc Devroye,et al.  The Expected Length of the Longest Probe Sequence for Bucket Searching when the Distribution is Not Uniform , 1985, J. Algorithms.

[18]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[19]  Werner Schachinger The Variance of Partial Match Retrieval in a Multidimensional Symmetric Trie , 1995, Random Struct. Algorithms.

[20]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21]  Jérémie Bourdon Analyse dynamique d'algorithmes : exemples en arithmétique et en théorie de l'information , 2002 .

[22]  M. Ledoux On Talagrand's deviation inequalities for product measures , 1997 .

[23]  Jérémie Bourdon,et al.  On the Stack-Size of General Tries , 2001, RAIRO Theor. Informatics Appl..

[24]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[25]  Wojciech Szpankowski,et al.  Patricia tries again revisited , 1990, JACM.

[26]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[27]  Claude Puech,et al.  Quadtrees, octrees, hyperoctrees: a unified analytical approach to tree data structures used in graphics, geometric modeling and image processing , 1985, SCG '85.

[28]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[29]  Michel Talagrand A new isoperimetric inequality for product measure and the tails of sums of independent random variables , 1991 .

[30]  Luc Devroye,et al.  A Note on the Probabilistic Analysis of Patricia Trees , 1992, Random Struct. Algorithms.

[31]  Jérémie Bourdon,et al.  Size and path length of Patricia tries: Dynamical sources context , 2001, Random Struct. Algorithms.

[32]  P. Flajolet,et al.  Algebraic Methods for Trie Statistics , 1985 .

[33]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[34]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[35]  A. V. D. Vaart,et al.  Lectures on probability theory and statistics , 2002 .

[36]  Philippe Jacquet,et al.  Analysis of digital tries with Markovian dependency , 1991, IEEE Trans. Inf. Theory.

[37]  Mireille Régnier,et al.  Trie Partitioning Process: Limiting Distributions , 1986, CAAP.

[38]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[39]  Mireille Régnier,et al.  New results on the size of tries , 1989, IEEE Trans. Inf. Theory.

[40]  Helmut Prodinger,et al.  Multidimensional Digital Searching and Some New Parameters in Tries , 1993, Int. J. Found. Comput. Sci..

[41]  Michel Talagrand Sample Boundedness of Stochastic Processes Under Increment Conditions , 1990 .

[42]  M. Talagrand A new look at independence , 1996 .

[43]  M. Talagrand An isoperimetric theorem on the cube and the Kintchine-Kahane inequalities , 1988 .

[44]  M. Talagrand A new isoperimetric inequality and the concentration of measure phenomenon , 1991 .

[45]  Boris G. Pittel,et al.  How many random questions are necessary to identify n distinct objects? , 1990, J. Comb. Theory, Ser. A.

[46]  Helmut Prodinger,et al.  Some Further Results on Digital Search Trees , 1986, ICALP.

[47]  Gaston H. Gonnet,et al.  Expected Length of the Longest Probe Sequence in Hash Code Searching , 1981, JACM.

[48]  Wojciech Szpankowski,et al.  Some Results on V-ary Asymmetric Tries , 1988, J. Algorithms.

[49]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[50]  Philippe Flajolet,et al.  Tree structures for partial match retrieval , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[51]  Walter A. Burkhard,et al.  Heuristics for Partial-Match Retrieval Data Base Design , 1976, Inf. Process. Lett..

[52]  Helmut Prodinger,et al.  On the variance of the external path length in a symmetric digital trie , 1989, Discret. Appl. Math..

[53]  P. MassartLedoux,et al.  Concentration Inequalities Using the Entropy Method , 2002 .

[54]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[55]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[56]  M. Talagrand,et al.  New Gaussian estimates for enlarged balls , 1993 .

[57]  Wojciech Szpankowski,et al.  Heights in Generalized Tries and PATRICIA Tries , 2000, LATIN.

[58]  M. Talagrand Isoperimetry, logarithmic sobolev inequalities on the discrete cube, and margulis' graph connectivity theorem , 1993 .

[59]  B. Pittel Paths in a random digital tree: limiting distributions , 1986, Advances in Applied Probability.

[60]  Philippe Flajolet,et al.  Continued Fraction Algorithms, Functional Operators, and Structure Constants , 1998, Theor. Comput. Sci..

[61]  C. McDiarmid Concentration , 1862, The Dental register.

[62]  Mireille Régnier,et al.  Normal limiting distribution for the size and the external path length of tries , 1988 .

[63]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[64]  Mireille Régnier,et al.  Trie hashing analysis , 1988, Proceedings. Fourth International Conference on Data Engineering.

[65]  Wojciech Szpankowski Digital Data Structures and Order Statistics , 1989, WADS.

[66]  Philippe Flajolet,et al.  Digital Search Trees Revisited , 1986, SIAM J. Comput..

[67]  Mireille Régnier,et al.  On the Average Height of Trees in Digital Search and Dynamic Hashing , 1981, Inf. Process. Lett..

[68]  Jack A. Orenstein Multidimensional Tries Used for Associative Searching , 1982, Inf. Process. Lett..

[69]  Philippe Flajolet,et al.  Partial match retrieval of multidimensional data , 1986, JACM.

[70]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[71]  Wojciech Szpankowski,et al.  On the height of digital trees and related problems , 1991, Algorithmica.

[72]  Philippe Jacquet,et al.  Limiting Distribution for the Depth in Patricia Tries , 1993, SIAM J. Discret. Math..

[73]  Werner Schachinger Limiting distributions for the costs of partial match retrievals in multidimensional tries , 2000, Random Struct. Algorithms.