Hidden Markov models for trajectory modeling

Current state-of-the-art statistical speech recognition systems use hidden Markov models (HMM) for modeling the speech signal. However, it is well known that HMM's do not exploit the time-dependence in the speech process, since they are limited by the assumption of conditional independence of observations given the state sequence. Alternative techniques, such as segment modeling approaches, can e ectively exploit time-dependencies in the acoustic signal by discarding the observation independence assumption. However, losing the basic HMM structure is often a high computational price to pay for improved acoustic models. In this paper, we introduce the parallel path HMM that exploits the time-dependence in speech via parametric trajectory models while maintaining the HMM framework. We present preliminary results on Switchboard, a large vocabulary conversational speech recognition task, demonstrating both improved modeling and potential for improved recognition performance.

[1]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Jacob Goldberger,et al.  Segmental modeling using a continuous mixture of nonparametric models , 1997, IEEE Trans. Speech Audio Process..

[4]  Amro El-Jaroudi,et al.  Multilingual speech recognition: the 1996 byblos callhome system , 1997, EUROSPEECH.

[5]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[6]  Herbert Gish,et al.  Parametric trajectory mixtures for LVCSR , 1998, ICSLP.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  H. L. Hartley,et al.  Manuscript Preparation , 2022 .

[9]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.