Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition

The paper addresses large vocabulary spontaneous speech recognition focusing on acoustic modeling that considers the speaking rate. Using the real lecture speech corpus collected under the priority research project in Japan, we have made baseline acoustic model, and evaluated on the automatic transcription of oral presentations by experienced speakers and obtained word accuracy of 58.2%. Compared with read speech, we have observed significant difference in the speaking rate. To handle fast and poorly articulated phone segments, several extensions of the modeling are explored. Specifically, we introduce stateskipping modeling, speech rate-dependent model, and syllable sub-word modeling. As a result, we reduced the word error rate by absolute 0.8%-2.0%. We also address a language modeling especially on effective use of various large text corpora.

[1]  Tatsuya Kawahara,et al.  An efficient two-pass search algorithm using word trellis index , 1998, ICSLP.

[2]  Thilo Pfau,et al.  Creating hidden Markov models for fast speech , 1998, ICSLP.

[3]  Jing Zheng,et al.  Word-level rate of speech modeling using rate-specific phones and pronunciations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Hitoshi Isahara,et al.  Toward the realization of spontaneous speech recognition - introduction of a Japanese priority program and preliminary results - , 2000, INTERSPEECH.

[5]  Ivica Rogina,et al.  Integrating dynamic speech modalities into context decision trees , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Kiyohiro Shikano,et al.  A new phonetic tied-mixture model for efficient decoding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Nobuaki Minematsu,et al.  Free software toolkit for Japanese large vocabulary continuous speech recognition , 2000, INTERSPEECH.

[8]  Tatsuya Kawahara,et al.  Automatic transcription of lecture speech using topic-independent language modeling , 2000, INTERSPEECH.