Speech Recognition Based Pronunciation Evaluation Using Pronunciation Variations and Anti-models for Non-native Language Learners

This paper proposes a speech recognition based automatic pronunciation evaluation method using pronunciation variations and anti-models for non-native language learners. To this end, the proposed pronunciation evaluation method consists of (a) speech recognition step and (b) pronunciation analysis step. As a first step, a Viterbi decoding algorithm is performed with a multiple pronunciation dictionary for non-native language learners, which is generated in an indirect data-driven method. As a result, the phoneme sequence, log-likelihoods of the acoustic models and anti-models and the duration of each phoneme are obtained for an input speech. As a second step, each recognized phoneme is evaluated using the speech recognition results and the reference phoneme sequence. For the automatic pronunciation evaluation experiments, we select English as a target language and Korean speakers as non-native language learners. Moreover, it is shown from the experiments that the proposed method achieves the average value between a false rejection rate (FRR) and a false alarm rate (FAR) as 32.4%, which outperforms an anti-model based method or a pronunciation variant based method.