Average Size of a Suffix Tree for Markov Sources

We study a suffix tree built from a sequence generated by a Markovian source. Such sources are more realistic probabilistic models for text generation, data compression, molecular applications, and so forth. We prove that the average size of such a suffix tree is asymptotically equivalent to the average size of a trie built over n independentsequences from the same Markovian source. This equivalenceis only known for memoryless sources. We then derive a formula for the size of a trie under Markovian model to complete the analysis for suffix trees. We accomplish our goal by applying some novel techniques of analytic combinatorics on words also known as analytic pattern matching

[1]  Philippe Jacquet,et al.  Average Profile of the Lempel-Ziv Parsing Scheme for a Markovian Source , 2001, Algorithmica.

[2]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[3]  Wojciech Szpankowski,et al.  Average redundancy of the Shannon code for Markov sources , 2013, 2013 IEEE International Symposium on Information Theory.

[4]  Philippe Jacquet,et al.  Autocorrelation on Words and Its Applications - Analysis of Suffix Trees by String-Ruler Approach , 1994, J. Comb. Theory A.

[5]  P. Flajolet,et al.  Digital Trees and Memoryless Sources: from Arithmetics to Analysis , 2010 .

[6]  Nicolas Pouyanne,et al.  Uncommon suffix tries , 2011, Random Struct. Algorithms.

[7]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[8]  Philippe Flajolet,et al.  Dynamical Sources in Information Theory : A General Analysis of Trie Structures , 1999 .

[9]  Philippe Jacquet,et al.  Analytic Pattern Matching - From DNA to Twitter , 2015 .

[10]  Werner Schachinger,et al.  On the Variance of a Class of Inductive Valuations of Data Structures for Digital Search , 1995, Theor. Comput. Sci..

[11]  Wojciech Szpankowski,et al.  Self-Alignments in Words and Their Applications , 1992, J. Algorithms.

[12]  Philippe Flajolet,et al.  An introduction to the analysis of algorithms , 1995 .

[13]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[14]  Wojciech Szpankowski,et al.  Average Redundancy of the Shannon Code for Markov Sources , 2013, IEEE Transactions on Information Theory.

[15]  Mark Daniel Ward,et al.  Analysis of the average depth in a suffix tree under a Markov model , 2005 .

[16]  Philippe Jacquet,et al.  Analysis of digital tries with Markovian dependency , 1991, IEEE Trans. Inf. Theory.

[17]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[18]  Mireille Régnier,et al.  New results on the size of tries , 1989, IEEE Trans. Inf. Theory.

[19]  Hsien-Kuei Hwang,et al.  Asymptotic variance of random symmetric digital search trees , 2009, Discret. Math. Theor. Comput. Sci..

[20]  W. Szpankowski Average Case Analysis of Algorithms on Sequences , 2001 .

[21]  Mireille Régnier,et al.  On Pattern Frequency Occurrences in a Markovian Sequence , 1998, Algorithmica.

[22]  Philippe Jacquet,et al.  Analytical Depoissonization and its Applications , 1998, Theor. Comput. Sci..

[23]  P. Shields The Ergodic Theory of Discrete Sample Paths , 1996 .