This paper proposes a speech recognition based automatic pronunciation evaluation method using pronunciation variations and anti-models for non-native language learners. To this end, the proposed pronunciation evaluation method consists of (a) speech recognition step and (b) pronunciation analysis step. As a first step, a Viterbi decoding algorithm is performed with a multiple pronunciation dictionary for non-native language learners, which is generated in an indirect data-driven method. As a result, the phoneme sequence, log-likelihoods of the acoustic models and anti-models and the duration of each phoneme are obtained for an input speech. As a second step, each recognized phoneme is evaluated using the speech recognition results and the reference phoneme sequence. For the automatic pronunciation evaluation experiments, we select English as a target language and Korean speakers as non-native language learners. Moreover, it is shown from the experiments that the proposed method achieves the average value between a false rejection rate (FRR) and a false alarm rate (FAR) as 32.4%, which outperforms an anti-model based method or a pronunciation variant based method.
[1]
Maxine Eskénazi,et al.
An overview of spoken language technology for education
,
2009,
Speech Commun..
[2]
John H. L. Hansen,et al.
Discrete-Time Processing of Speech Signals
,
1993
.
[3]
Ho-Young Jung,et al.
Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition
,
2010
.
[4]
S. J. Young,et al.
Tree-based state tying for high accuracy acoustic modelling
,
1994
.
[5]
Hong Kook Kim,et al.
Non-native pronunciation variation modeling using an indirect data driven method
,
2007,
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).