Asynchronous HMM with applications to speech recognition

We develop a novel formalism for modeling speech signals which are irregularly or incompletely sampled. This situation can arise in real world applications where the speech signal is being transmitted over an error prone channel where parts of the signal can be dropped. Typical speech systems based on hidden Markov models, cannot handle such data since HMMs rely on the assumption that observations are complete and made at regular intervals. We introduce the asynchronous HMM, a variant of the inhomogeneous HMM commonly used in bioinformatics, and show how it can be used to model irregularly or incompletely sampled data. A nested EM algorithm is presented in brief which can be used to learn the parameters of this asynchronous HMM. Evaluation on real world speech data, which has been modified to simulate channel errors, shows that this model and its variants significantly outperform the standard HMM and methods based on data interpolation.

[1]  Hermann Ney,et al.  A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[4]  Michael Picheny,et al.  Context dependent phonetic duration models for decoding conversational speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.