Boosting HMM performance with a memory upgrade

The state-of-the-art in automatic speech recognition is distinctly Markovian. The ubiquitous ‘beads-on-a-string’ approach, where sentences are explained as a sequence of words, words as a sequence of phones and phones as a sequence of acoustically stable states, is bound to lose a lot of dynamic information. In this paper we show that a combination with example-based recognition can be used to recapture some of that information. A new approach to combine Hidden Markov Model (HMM) and phone-examplebased continuous speech recognition is presented. Experiments show that the combination outperforms the HMM recognizer, and indicate that adding long-span information is especially beneficial.

[1]  Jithendra Vepa,et al.  Improving speech recognition using a data-driven approach , 2005, INTERSPEECH.

[2]  Dirk Van Compernolle,et al.  Fast and accurate acoustic modelling with semi-continuous HMMs , 1998, Speech Commun..

[3]  Patrick Wambacq,et al.  An efficient search space representation for large vocabulary continuous speech recognition , 2000, Speech Commun..

[4]  S. Axelrod,et al.  Combination of hidden Markov models with dynamic time warping for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hugo Van hamme,et al.  FLavor: a flexible architecture for LVCSR , 2003, INTERSPEECH.

[6]  J. Bilmes,et al.  Discriminatively Structured Graphical Models for Speech Recognition The Graphical Models Team JHU 2001 Summer Workshop , 2001 .

[7]  Dirk Van Compernolle,et al.  A discriminative locally weighted distance measure for speaker independent template based speech recognition , 2004, INTERSPEECH.

[8]  Patrick Wambacq,et al.  A locally weighted distance measure for example based speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Patrick Wambacq,et al.  Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[10]  Dirk Van Compernolle,et al.  Optimal feature sub-space selection based on discriminant analysis , 1999, EUROSPEECH.

[11]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[13]  Lori Lamel,et al.  Speaker-independent continuous speech dictation , 1993, Speech Communication.