To recognize non-native speech, larger acoustic/linguistic distortions must be handled adequately in acoustic modeling, language modeling, lexical modeling, and/or decoding strategy. In this paper, a novel method to enhance MLLR adaptation of acoustic models for non-native speech recognition is proposed. In the case of native speech recognition, MLLR speaker adaptation was successfully introduced because it enables efficient adaptation with a small number of adaptation data by using a regression tree of Gaussian mixtures of HMMs. However, as for non-native speech, most of the cases, the regression tree built from the baseline HMMs does not match with pronunciation proficiency of a speaker. This paper provides a solution for this problem, where the speaker’s proficiency is automatically estimated and the tree suited for the proficiency is built, which can be viewed as proficiency adaptation. Recognition experiments show that MLLR with the new tree raises the averaged error reduction rate up to about 30 % from the baseline MLLR performance of approximately 20 %.
[1]
Filipp Korkmazskiy,et al.
Joint pronunciation modelling of non-native speakers using data-driven methods
,
2000,
INTERSPEECH.
[2]
Chao Huang,et al.
Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition
,
2000,
INTERSPEECH.
[3]
Nobuaki Minematsu,et al.
English Speech Database Read by Japanese Learners for CALL System Development
,
2002,
LREC.
[4]
Laura Mayfield Tomokiyo,et al.
Lexical and acoustic modeling of non-native speech in LVSCR
,
2000,
INTERSPEECH.
[5]
Keikichi Hirose,et al.
Corpus-based analysis of English spoken by Japanese students in view of the entire phonemic system of English
,
2002,
INTERSPEECH.