Estimating Statistics on Words Using Ambiguous Descriptions

In this article we propose an alternative way to prove some recent results on statistics on words, such as the expected number of runs or the expected sum of the run exponents. Our approach consists in designing a general framework, based on the symbolic method developped in analytic combinatorics. The descriptions obtained in this framework are built in such a way that the degree of ambiguity of an object O (i.e., the number of different descriptions corresponding to O) is exactly the value of the statistic under study for O. The asymptotic estimation of the expectation is then done using classical techniques from analytic combinatorics. To show the generality of our method, we not only apply it to obtain new proofs of known results but also extend them from the uniform distribution to any memoryless distribution.

[1]  Philippe Flajolet,et al.  Hidden word statistics , 2006, JACM.

[2]  Pawel Gawrychowski,et al.  Computing the Longest Unbordered Substring , 2015, SPIRE.

[3]  Wojciech Rytter The number of runs in a string , 2007, Inf. Comput..

[4]  Maxime Crochemore,et al.  Abelian borders in binary words , 2014, Discret. Appl. Math..

[5]  Szymon Grabowski,et al.  Average-optimal string matching , 2009, J. Discrete Algorithms.

[6]  Lucian Ilie,et al.  The "runs" conjecture , 2011, Theor. Comput. Sci..

[7]  Simon J. Puglisi,et al.  The expected number of runs in a word , 2008, Australas. J Comb..

[8]  Maxime Crochemore,et al.  On the average number of regularities in a word , 2014, Theor. Comput. Sci..

[9]  Jamie Simpson,et al.  The total run length of a word , 2013, Theor. Comput. Sci..

[10]  Qian Yang,et al.  An asymptotic Lower Bound for the Maximal Number of Runs in a String , 2008, Int. J. Found. Comput. Sci..

[11]  Lucian Ilie,et al.  Maximal repetitions in strings , 2008, J. Comput. Syst. Sci..

[12]  Philippe Flajolet,et al.  Analytic Combinatorics , 2009 .

[13]  Hideo Bannai,et al.  New Lower Bounds for the Maximum Number of Runs in a String , 2008, Stringology.

[14]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[15]  Jamie Simpson Modified Padovan words and the maximum number of runs in a word , 2010, Australas. J Comb..

[16]  William F. Smyth,et al.  How many runs can a string contain? , 2008, Theor. Comput. Sci..

[17]  Ayumi Shinohara,et al.  Average Value of Sum of Exponents of Runs in a String , 2009, Int. J. Found. Comput. Sci..