Waiting time distribution of generalized later patterns

In this paper the concept of later waiting time distributions for patterns in multi-state trials is generalized to cover a collection of compound patterns that must all be counted pattern-specific numbers of times, and a practical method is given to compute the generalized distribution. The solution given applies to overlapping counting and two types of non-overlapping counting, and the underlying sequences are assumed to be Markovian of a general order. Patterns are allowed to be weighted so that an occurrence is counted multiple times, and patterns may be completely included in longer patterns. Probabilities are computed through an auxiliary Markov chain. As the state space associated with the auxiliary chain can be quite large if its setup is handled in a naive fashion, an algorithm is given for generating a ''minimal'' state space that leaves out states that can never be reached. For the case of overlapping counting, a formula that relates probabilities for intersections of events to probabilities for unions of subsets of the events is also used, so that the distribution is also computed in terms of probabilities for competing patterns. A detailed example is given to illustrate the methodology.

[1]  J. D. Biggins,et al.  Markov renewal processes, counters and repeated sequences in Markov chains , 1987, Advances in Applied Probability.

[2]  C. Geiss,et al.  An introduction to probability theory , 2008 .

[3]  K. D. Ling,et al.  On the soonest and latest waiting time distributions: succession quotas , 1993 .

[4]  K. D. Ling A generalization of the sooner and later waiting time problems for Bernoulli trials: Frequency quota , 1992 .

[5]  Milton Sobel,et al.  Sooner and later waiting time problems for Bernoulli trials: frequency and run quotas , 1990 .

[6]  Pierre Baldi,et al.  Distribution patterns of over-represented k-mers in non-coding yeast DNA , 2002, Bioinform..

[7]  Małgorzata Roos,et al.  Runs and Scans With Applications , 2001 .

[8]  Donald E. K. Martin,et al.  Waiting time distributions of competing patterns in higher-order Markovian sequences , 2005 .

[9]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[10]  M. Tompa,et al.  Discovery of novel transcription factor binding sites by statistical overrepresentation. , 2002, Nucleic acids research.

[11]  D. Landsman,et al.  Statistical analysis of over-represented words in human promoter sequences. , 2004, Nucleic acids research.

[12]  Leda D. Minkova,et al.  Run and frequency quotas in a multi-state Markov Chain , 1999 .

[13]  Markos V. Koutras,et al.  Distribution Theory of Runs: A Markov Chain Approach , 1994 .

[14]  Stéphane Robin,et al.  DNA, words and models , 2005 .

[15]  Graziano Pesole,et al.  In silico representation and discovery of transcription factor binding sites , 2004, Briefings Bioinform..

[16]  P. A. P. Moran,et al.  An introduction to probability theory , 1968 .

[17]  Michael R. Chernick,et al.  Runs and Scans With Applications , 2002, Technometrics.

[18]  Michael Q. Zhang,et al.  DWE: Discriminating Word Enumerator , 2005, Bioinform..

[19]  J. van Helden,et al.  Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. , 2000, Nucleic acids research.

[20]  E. Davidson,et al.  Modular cis-regulatory organization of developmentally expressed genes: two genes transcribed territorially in the sea urchin embryo, and additional examples. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Cox The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[22]  Yung-Ming Chang,et al.  On ordered series and later waiting time distributions in a sequence of Markov dependent multistate trials , 2003, Journal of Applied Probability.

[23]  Leda D. Minkova,et al.  Quotas on runs of successes and failures in a multi-state markov chain , 1999 .