Recognition and verification of English by Japanese students for computer-assisted language learning system

We address methods for recognizing English spoken by Japanese students as the basis for our Computer-Assisted Language Learning (CALL) system. For automatic phonemic error detection, pronunciation error prediction is executed for a given orthographic text. To improve reliability, speaker adaptation and segment-input pair-wise verification are applied as pre-processing and post-processing, respectively. We also address acoustic modeling as a means for coping with the large acoustic variation seen in nonnative speech. First, English acoustic models are trained using a database of English spoken by Japanese students. Japanese phonemes that are regarded as allophones of English phonemes are then incorporated. We present the results of experimental comparison of these models and confirm the effectiveness of speaker adaptation and pair-wise verification.