String Overlaps, Pattern Matching, and Nontransitive Games

Abstract This paper studies several topics concerning the way strings can overlap. The key notion of the correlation of two strings is introduced, which is a representation of how the second string can overlap into the first. This notion is then used to state and prove a formula for the generating function that enumerates the q -ary strings of length n which contain none of a given finite set of patterns. Various generalizations of this basic result are also discussed. This formula is next used to study a wide variety of seemingly unrelated problems. The first application is to the nontransitive dominance relations arising out of a probabilistic coin-tossing game. Another application shows that no algorithm can check for the presence of a given pattern in a text without examining essentially all characters of the text in the worst case. Finally, a class of polynomials arising in connection with the main result are shown to be irreducible.

[1]  S. Li,et al.  A Martingale Approach to the Study of Occurrence of Sequence Patterns in Repeated Experiments , 1980 .

[2]  I. Goulden,et al.  AN INVERSION THEOREM FOR CLUSTER DECOMPOSITIONS OF SEQUENCES WITH DISTINGUISHED SUBSEQUENCES , 1979 .

[3]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[4]  Leonidas J. Guibas,et al.  Periods in Strings , 1981, J. Comb. Theory, Ser. A.

[5]  Mohan S. Putcha,et al.  Some Combinatorial Properties of Free Semigroups , 1977 .

[6]  Leo J. Guibas,et al.  Maximal Prefix-Synchronized Codes , 1978 .

[7]  B. Saperstein Note on a clustering problem , 1975 .

[8]  N. Ankeny,et al.  A note on the class-numbers of algebraic number fields , 1956 .

[9]  Caxton C. Foster,et al.  Non-Transitive Dominance , 1976 .

[10]  R. T. Leslie Recurrent composite events , 1967, Journal of Applied Probability.

[11]  Peter Tolstrup Nielsen,et al.  On the expected duration of a search for a fixed pattern in random data , 1973 .

[12]  Leonidas J. Guibas,et al.  A New Proof of the Linearity of the Boyer-Moore String Searching Algorithm , 1980, SIAM J. Comput..

[13]  S. W. Roberts Properties of control chart zone tests , 1958 .

[14]  Ronald L. Rivest On the Worst-Case Behavior of String-Searching Algorithms , 1977, SIAM J. Comput..

[15]  A. D. Solov’ev A Combinatorial Identity and Its Application to the Problem Concerning the First Occurrence of a Rare Event , 1966 .

[16]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.