Distributions associated with general runs and patterns in hidden Markov models

This paper gives a method for computing distributions associated with patterns in the state sequence of a hidden Markov model, conditional on observing all or part of the observation sequence. Probabilities are computed for very general classes of patterns (competing patterns and generalized later patterns), and thus, the theory includes as special cases results for a large class of problems that have wide application. The unobserved state sequence is assumed to be Markovian with a general order of dependence. An auxiliary Markov chain is associated with the state sequence and is used to simplify the computations. Two examples are given to illustrate the use of the methodology. Whereas the first application is more to illustrate the basic steps in applying the theory, the second is a more detailed application to DNA sequences, and shows that the methods can be adapted to include restrictions related to biological knowledge.

[1]  Donald E. K. Martin,et al.  Waiting time distribution of generalized later patterns , 2008, Comput. Stat. Data Anal..

[2]  Valeria De Fonzo,et al.  Hidden Markov Models in Bioinformatics , 2007 .

[3]  Christina Kendziorski,et al.  Hidden Markov Models for Microarray Time Course Data in Multiple Biological Conditions , 2006 .

[4]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[5]  C. Sims,et al.  Were there Regime Switches in U.S. Monetary Policy , 2006 .

[6]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[8]  Donald E. K. Martin,et al.  Waiting time distributions of competing patterns in higher-order Markovian sequences , 2005 .

[9]  Laurent Gueguen,et al.  Sarment: Python modules for HMM analysis and partitioning of sequences , 2005, Bioinform..

[10]  Leo Wang-Kit Cheung,et al.  Use of Runs Statistics for Pattern Recognition in Genomic DNA Sequences , 2004, J. Comput. Biol..

[11]  Yung-Ming Chang,et al.  On ordered series and later waiting time distributions in a sequence of Markov dependent multistate trials , 2003, Journal of Applied Probability.

[12]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[13]  Michael K. Ng,et al.  Higher-Order Hidden Markov Models with Applications to DNA Sequences , 2003, IDEAL.

[14]  Daiya Takai,et al.  The CpG Island Searcher: A new WWW resource , 2003, Silico Biol..

[15]  Michael R. Chernick,et al.  Runs and Scans With Applications , 2002, Technometrics.

[16]  Gang Uk Hwang,et al.  The waiting time analysis of a discrete-time queue with arrivals as a discrete autoregressive process of order 1 , 2002, Journal of Applied Probability.

[17]  Daiya Takai,et al.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jia Li,et al.  Image Segmentation and Compression Using Hidden Markov Models , 2000 .

[19]  S. P. Pederson,et al.  Hidden Markov and Other Models for Discrete-Valued Time Series , 1998 .

[20]  S. Perkins Inside old faithful: Scientists look down the throat of a geyser , 1997 .

[21]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[22]  Markos V. Koutras,et al.  Runs, scans and URN model distributions: A unified Markov chain approach , 1995 .

[23]  Markos V. Koutras,et al.  Distribution Theory of Runs: A Markov Chain Approach , 1994 .

[24]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[25]  R. Fildes Forecasting structural time series models and the kalman filter: Andrew Harvey, 1989, (Cambridge University Press), 554 pp., ISBN 0-521-32196-4 , 1992 .

[26]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[27]  James D. Hamilton A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle , 1989 .

[28]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[29]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[30]  A. Bird CpG islands as gene markers in the vertebrate nucleus , 1987 .

[31]  A. Raftery A model for high-order Markov chains , 1985 .

[32]  Joseph Naus,et al.  Approximations for Distributions of Scan Statistics , 1982 .

[33]  G. Lindgren Markov regime models for mixed distributions and switching regressions , 1978 .

[34]  T. Petrie Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[35]  D. Cox The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.