Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes
暂无分享,去创建一个
RECOGNITION: ANALYSIS & ANTIDOTES Nikki Mirghafori, Eric Fosler, and Nelson Morgan International Computer Science Institute and University of California, Berkeley fnikki, fosler, morgang@icsi.berkeley.edu ABSTRACT The performance of automatic speech recognizers (ASR) typically degrades for test speakers with \outlier" characteristics, for example, speakers with foreign accent and fast speaking rate. In this work, we concentrate on the latter. Consistent with other researchers, we have observed that for speakers with exceptionally high speaking rate, the word recognition error is signi cantly higher. We have investigated two possible causes for this e ect. Inherent spectral di erences may cause the extracted features for these outliers to be signi cantly di erent from that of normal speech. Also, due to phone omissions and duration reduction, the normal word-models may not be suitable for fast speech. Based on our exploratory experiments on TIMIT and WSJ corpora, we believe the spectral di erences and duration reduction are both signi cant sources of the increased error. By adapting our MLP phonetic probability estimator to fast speech, and employing fast speaker word-models, we have been able to eliminate about 16% of the fast speaker word recognition errors.
[1] Richard M. Stern,et al. On the effects of speech rate in large vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[2] Andreas Stolcke,et al. Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.
[3] Ellen M. Kaisse. Connected Speech: The Interaction of Syntax and Phonology , 1985 .
[4] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.