论文信息 - Methods for pronunciation assessment in computer aided language learning

Methods for pronunciation assessment in computer aided language learning

Learning a foreign language is a challenging endeavor that entails acquiring a wide range of new knowledge including words, grammar, gestures, sounds, etc. Mastering these skills all require extensive practice by the learner and opportunities may not always be available. Computer Aided Language Learning (CALL) systems provide non-threatening environments where foreign language skills can be practiced where ever and whenever a student desires. These systems often have several technologies to identify the different types of errors made by a student. This thesis focuses on the problem of identifying mispronunciations made by a foreign language student using a CALL system. We make several assumptions about the nature of the learning activity: it takes place using a dialogue system, it is a task- or game-oriented activity, the student should not be interrupted by the pronunciation feedback system, and that the goal of the feedback system is to identify severe mispronunciations with high reliability. Detecting mispronunciations requires a corpus of speech with human judgements of pronunciation quality. Typical approaches to collecting such a corpus use an expert phonetician to both phonetically transcribe and assign judgements of quality to each phone in a corpus. This is time consuming and expensive. It also places an extra burden on the transcriber. We describe a novel method for obtaining phone level judgements of pronunciation quality by utilizing non-expert, crowd-sourced, word level judgements of pronunciation. Foreign language learners typically exhibit high variation and pronunciation shapes distinct from native speakers that make analysis for mispronunciation difficult. We detail a simple, but effective method for transforming the vowel space of non-native speakers to make mispronunciation detection more robust and accurate. We show that this transformation not only enhances performance on a simple classification task, but also results in distributions that can be better exploited for mispronunciation detection. This transformation of the vowel is exploited to train a mispronunciation detector using a variety of features derived from acoustic model scores and vowel class distributions. We confirm that the transformation technique results in a more robust and accurate identification of mispronunciations than traditional acoustic models. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Mitchell Peabody | Mitchell Peabody

[1] Francis Destombes,et al. The Development and Application of the IBM Speech Viewer , 1993 .

[2] Mervyn A. Jack,et al. SPELL: An automated system for computer-aided pronunciation teaching , 1993, Speech Commun..

[3] Vikas Sindhwani,et al. Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[4] Douglas Morgenstern. The Athena Language Learning Project. , 1986 .

[5] Gregor Möhler,et al. Intonational Foreign Accent : Speech Technology and Foreign Language Teaching , 1998 .

[6] Stacy Marsella,et al. The DARWARS Tactical Language Training System , 2004 .

[7] Allan R. James,et al. Second Language Speech , 1995 .

[8] Harry S. Wohlert. German by Satellite , 1991 .

[9] Robert C. Gardner,et al. Language Anxiety: Its Relationship to Other Anxieties and to Processing in Native and Second Languages* , 1991 .

[10] Robert S. Hart. The Illinois PLATO Foreign Languages Project , 2013 .

[11] Helmer Strik,et al. Feedback in computer assisted pronunciation training: technology push or demand pull? , 2002, INTERSPEECH.