Application of linguistic knowledge of language transfer to automatic speech recognition (ASR) technology can enhance mispronunciation detection performance in computer-aided pronunciation training (CAPT). This is achieved by pinpointing salient pronunciation errors made by second language learners. In this work, we propose to apply decision fusion for further improvement in mispronunciation detection performance. Detection decision from the linguistically-motivated detection, which applies language transfer knowledge, is used as the basis. Back off to posterior probability based pronunciation scoring with phoneme-dependent thresholds is employed when the basis is "less-reliable". Fusion can help combat problems such as incomplete coverage of linguistic knowledge as well as the imperfection of acoustic models in ASR. Our fusion strategy can maintain the diagnosis capability of the linguistically-motivated approach while achieve a major boost in detection performance. Experimental results show that decision fusion can achieve relative improvement in mispronunciation detection of up to 30% reduction in total number of decision errors.
[1]
Lan Wang,et al.
Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer
,
2008,
INTERSPEECH.
[2]
Vassilios Digalakis,et al.
Combination of machine scores for automatic grading of pronunciation quality
,
2000,
Speech Commun..
[3]
Yuen Yee Lo,et al.
Deriving salient learners’ mispronunciations from cross-language phonological comparisons
,
2007,
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).
[4]
Steve J. Young,et al.
Phone-level pronunciation scoring and assessment for interactive language learning
,
2000,
Speech Commun..
[5]
Tatsuya Kawahara,et al.
Practical use of English pronunciation system for Japanese students in the CALL classroom
,
2004,
INTERSPEECH.