Estimating Hidden Semi-Markov Chains From Discrete Sequences

This article addresses the estimation of hidden semi-Markov chains from nonstationary discrete sequences. Hidden semi-Markov chains are particularly useful to model the succession of homogeneous zones or segments along sequences. A discrete hidden semi-Markov chain is composed of a nonobservable state process, which is a semi-Markov chain, and a discrete output process. Hidden semi-Markov chains generalize hidden Markov chains and enable the modeling of various durational structures. From an algorithmic point of view, a new forward-backward algorithm is proposed whose complexity is similar to that of the Viterbi algorithm in terms of sequence length (quadratic in the worst case in time and linear in space). This opens the way to the maximum likelihood estimation of hidden semi-Markov chains from long sequences. This statistical modeling approach is illustrated by the analysis of branching and flowering patterns in plants.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Pierre A. Devijver,et al.  Baum's forward-backward algorithm revisited , 1985, Pattern Recognit. Lett..

[5]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[6]  G. Kitagawa Non-Gaussian State—Space Modeling of Nonstationary Time Series , 1987 .

[7]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[10]  P. Green On Use of the EM Algorithm for Penalized Likelihood Estimation , 1990 .

[11]  Yann Guédon,et al.  Explicit state occupancy modelling by hidden semi-Markov models: application of Derin's scheme , 1990 .

[12]  O. Aalen,et al.  Statistical analysis of repeated events forming renewal processes. , 1991, Statistics in medicine.

[13]  Yann Guédon Review of several stochastic speech unit models , 1992 .

[14]  D R Fredkin,et al.  Bayesian restoration of single-channel patch clamp recordings. , 1992, Biometrics.

[15]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[16]  Richard M. Brugger Univariate Discrete Distributions (2nd Ed.) , 1994 .

[17]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[18]  J. Ben Atkinson,et al.  Modeling and Analysis of Stochastic Systems , 1996 .

[19]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[20]  Christophe Godin,et al.  Measuring and analysing plants with the AMAPmod software , 1997 .

[21]  Lain L. MacDonald,et al.  Hidden Markov and Other Models for Discrete- valued Time Series , 1997 .

[22]  H. Müller,et al.  Statistical methods for DNA sequence segmentation , 1998 .

[23]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[24]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[25]  Hidden semi-Markov chains : a new tool for analyzing nonstationary discrete sequences , 1998 .

[26]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[27]  Yann Guédon,et al.  Computational methods for discrete hidden semi‐Markov chains , 1999 .

[28]  Christophe Godin,et al.  Exploration of a plant architecture database with the AMAPmod software illustrated on an apple tree hybrid family , 1999 .

[29]  Y. Guédon,et al.  Pattern analysis in branching and axillary flowering sequences. , 2001, Journal of theoretical biology.