The Expected Time to Find a String in a Random Binary Sequence

The proverbial monkey typing at random will, given sufficient time, produce the complete works of Shakespeare (along with near misses, good tries, and reams of outright garbage.) How long might this be expected to take? At the risk of being overly reductive, the works of Shakespeare can be viewed as one particular long string of characters. So, more generally, exactly how long is the expected waiting time for a given string to appear in a stream of random characters? The first definitive answer seems to have been given by P.T. Nielsen [11], and rediscovered about 10 years later by G. Blom [2]. Given that the question is interesting and can be solved by relatively elementary means, one suspects that the solution has been rediscovered many times since Nielsen’s paper (and possibly before.) Indeed, the author of one intermediate probability text (see [12, pp 186-187]) discusses the problem, and even hints at its general solution, without providing a reference. In this paper we describe an algorithm for computing the expected time until the first appearance of a given string in random data, and also discuss some related problems. There is little that is new in the arguments we give – they are, for the most part, adapted from the ones given in the papers cited above. We also discuss the connection with some important problems in computation, such as searching efficiently for a given string in arbitrary but non-random text. To focus the ideas we shall confine our attention to the alphabet of binary digits, but all results can be generalized easily to the case of an arbitrary finite alphabet.

[1]  R. Durrett Probability: Theory and Examples , 1993 .

[2]  Gunnar Blom On the mean number of random digits until a given sequence occurs , 1982 .

[3]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[4]  G. Blom,et al.  How many random digits are required until given sequences are obtained? , 1982, Journal of Applied Probability.

[5]  Jorge Nuno Silva,et al.  Mathematical Games , 1959, Nature.

[6]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.