论文信息 - Boosting HMM performance with a memory upgrade

Boosting HMM performance with a memory upgrade

The state-of-the-art in automatic speech recognition is distinctly Markovian. The ubiquitous ‘beads-on-a-string’ approach, where sentences are explained as a sequence of words, words as a sequence of phones and phones as a sequence of acoustically stable states, is bound to lose a lot of dynamic information. In this paper we show that a combination with example-based recognition can be used to recapture some of that information. A new approach to combine Hidden Markov Model (HMM) and phone-examplebased continuous speech recognition is presented. Experiments show that the combination outperforms the HMM recognizer, and indicate that adding long-span information is especially beneficial.

Dirk Van Compernolle | Kris Demuynck | Mathias De Wachter | Kris Demuynck | M. D. Wachter

[1] Jithendra Vepa,et al. Improving speech recognition using a data-driven approach , 2005, INTERSPEECH.

[2] Dirk Van Compernolle,et al. Fast and accurate acoustic modelling with semi-continuous HMMs , 1998, Speech Commun..

[3] Patrick Wambacq,et al. An efficient search space representation for large vocabulary continuous speech recognition , 2000, Speech Commun..

[4] S. Axelrod,et al. Combination of hidden Markov models with dynamic time warping for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Hugo Van hamme,et al. FLavor: a flexible architecture for LVCSR , 2003, INTERSPEECH.

[6] J. Bilmes,et al. Discriminatively Structured Graphical Models for Speech Recognition The Graphical Models Team JHU 2001 Summer Workshop , 2001 .

[7] Dirk Van Compernolle,et al. A discriminative locally weighted distance measure for speaker independent template based speech recognition , 2004, INTERSPEECH.

[8] Patrick Wambacq,et al. A locally weighted distance measure for example based speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Patrick Wambacq,et al. Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[10] Dirk Van Compernolle,et al. Optimal feature sub-space selection based on discriminant analysis , 1999, EUROSPEECH.

[11] Hermann Ney,et al. Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[13] Lori Lamel,et al. Speaker-independent continuous speech dictation , 1993, Speech Communication.