Statistical language modeling with prosodic boundaries and its use for continuous speech recognition
暂无分享,去创建一个
A new statistical language modeling was proposed where word n-gram was counted separately for the cases crossing and not crossing accent phrase boundaries. Since such counting requires a large speech corpus, which hardly can be prepared, part-of-speech (POS) n-gram was first counted for a small-sized speech corpus for the two cases instead, and then the result is applied to word n-gram counts of a large text corpus to divide them accordingly. Thus, the two types of word n-gram model can be obtained. Using ATR continuous speech corpus by two speakers, perplexity reduction from the baseline model to the proposed model was calculated for the word bi-gram. When accent phrase boundary information of the speech corpus was used, the reduction reached 11%, and when boundaries were extracted using our formerly developed method based on mora-F0 transition modeling, it still exceeded 8%. The reduction around 5% was still observed for sentences not included for the calculation of POS bi-gram and using boundaries automatically extracted from another speaker’s speech. The obtained bigram was applied to continuous speech recognition, resulted in a two-percentage improvement of word accuracy from when the baseline model was used.
[1] Keikichi Hirose,et al. Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[2] Keikichi Hirose,et al. N-gram Language Modeling of Japanese Using Prosodic Boundaries , 2002 .
[3] Elmar Nöth,et al. VERBMOBIL: the use of prosody in the linguistic components of a speech understanding system , 2000, IEEE Trans. Speech Audio Process..