LATTICE RESCORING EXPERIMENTS WITH DURATION MODELS

This paper reports on experiments using phone and word duration models to improve speech recognition accuracy. The duration information is integrated into state-of-the-art large vocabulary speech recognition systems by rescoring word lattices that include phone-level segmentations. Experimental results are given for a conversational telephone speech (CTS) task in French and for the TC-Star EPPS transcription task in Spanish and English. An absolute word error rate reduction of about 0.5% is observed for the CTS task, and smaller but consistent gains are observed for the EPPS task.

[1]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[2]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[3]  David Burshtein Robust parametric modeling of durations in hidden Markov models , 1996, IEEE Trans. Speech Audio Process..

[4]  Ning Ma,et al.  Context-dependent word duration modelling for robust speech recognition , 2005, INTERSPEECH.

[5]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  M. T. Johnson,et al.  Capacity and complexity of HMM duration modeling techniques , 2005, IEEE Signal Processing Letters.

[7]  Venkata Ramana Rao Gadde Modeling word durations , 2000, INTERSPEECH.

[8]  Jean-Luc Gauvain,et al.  Conversational telephone speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.