论文信息 - MODELING WORD DURATION FOR BETTER SPEECH RECOGNITION

MODELING WORD DURATION FOR BETTER SPEECH RECOGNITION

We describe a new method of modeling duration at word level. These duration models are easily trained from the acoustic training data and can be used to rescore N−best lists of recognition hypotheses. The models capture some of the well known durational effects such as prepausal lengthening. They incorporate a simple back off mechanism to handle unseen words during rescoring. Experiments with various large vocabulary conversational speech recognition (LVCSR) evaluation sets showed consistent improvements of 0.7−1.0% in word error rate (WER).

Venkata Ramana Rao | V. Rao

[1] Andreas Stolcke,et al. THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM , 2000 .

[2] Gökhan Tür,et al. Modeling the prosody of hidden events for improved word recognition , 1999, EUROSPEECH.

[3] Mari Ostendorf,et al. Probabilistic parse scoring with prosodic information , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Stephanie Seneff,et al. A hierarchical duration model for speech recognition based on the ANGIE framework , 1999, Speech Commun..

[5] Fergus McInnes,et al. Use of acoustic sentence level and lexical stress in HSMM speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Andrew Hunt. A generalised model for utilising prosodic information in continuous speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.