SLIM prosodic automatic tools for self-learning instruction

Abstract We present the Prosodic Module of a courseware for computer-assisted foreign language learning called SLIM – an acronym for Multimedia Interactive Linguistic Software, developed at the University of Venice (see Delmonte et al., 1999a,b). The Prosodic Module has been created in order to deal with the problem of improving a student's performance both in the perception and production of prosodic aspects of spoken language activities. It is composed of two different sets of Learning Activities, the first one dealing with phonetic and prosodic problems at word level and at segmental level – where segmental refers to syllable-sized segments; the second one dealing with prosodic aspects at phonological phrase and utterance suprasegmental level. The main goal of Prosodic Activities is to ensure consistent and pedagogically sound feedback to the student intending to improve his/her pronunciation in a foreign language. We argue that the use of Automatic Speech Recognition (ASR) as Teaching Aid should be under-utilized and should be targeted to narrowly focussed spoken exercises, disallowing open-ended dialogues, in order to ensure consistency of evaluation. In addition, we argue that ASR alone cannot be used to gauge Goodness of Pronunciation (GOP), being inherently inadequate for that goal. On the contrary, we support the conjoined use of ASR technology and prosodic tools to produce GOP useable for linguistically consistent and adequate feedback to the student.

[1]  Horacio Franco,et al.  Automatic detection of mispronunciation for language instruction , 1997, EUROSPEECH.

[2]  Pier Marco Bertinetto The perception of stress by Italian speakers , 1980 .

[3]  Rodolfo Delmonte,et al.  L'Accento di Parola nella Prosodia dell'Enunciato dell'Italiano Standard , 1981 .

[4]  N. Umeda Consonant duration in American English , 1977 .

[5]  Steve J. Young,et al.  Language learning based on non-native speech recognition , 1997, EUROSPEECH.

[6]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[7]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[8]  Ryohei Nakatsu,et al.  Automatic evaluation of English pronunciation based on speech recognition techniques , 1989, EUROSPEECH.

[9]  Chilin Shih,et al.  Multi-lingual duration modeling , 1997, EUROSPEECH.

[10]  Amir Najmi,et al.  An interactive dialog system for learning Japanese , 2000, Speech Commun..

[11]  J. Kittler,et al.  A New Method for Dynamic Time Alignment of Speech Waveforms , 1992 .

[12]  Yoon Kim,et al.  Automatic pronunciation scoring of specific phone segments for language instruction , 1997, EUROSPEECH.

[13]  Pietro Laface,et al.  Speech Recognition and Understanding: Recent Advances, Trends, and Applications , 1997 .

[14]  Motoko Ueyama,et al.  The phonology and phonetics of second language intonation: the case of "Japanese English" , 1997, EUROSPEECH.

[15]  Mervyn A. Jack,et al.  SPELL: An automated system for computer-aided pronunciation teaching , 1993, Speech Commun..

[16]  Pier Marco Bertinetto,et al.  Strutture prosodiche dell'italiano : accento, quantità, sillaba, giuntura, fondamenti metrici , 1981 .

[17]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[18]  Jan P. H. van Santen Prosodic Modeling in Text-to-Speech Synthesis , 1997 .

[19]  Keikichi Hirose,et al.  A CALL system using speech recognition to train the pronunciation of Japanese long vowels, the mora nasal and mora obstruents , 1997, EUROSPEECH.

[20]  Rodolfo Delmonte,et al.  Computing linguistic knowledge for text-to-speech systems with PROSO , 1991, EUROSPEECH.

[21]  Rodolfo Delmonte,et al.  A grammatical component for a text-to-speech system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Paul Christopher Bagshaw,et al.  Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[23]  Alex Waibel,et al.  Recognition of lexical stress in a continuous speech understanding system - A pattern recognition approach , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Rodolfo Delmonte,et al.  SLIM - a Model for Automatic Tutoring of Language Skills , 1996 .

[25]  Jan P. H. van Santen,et al.  Strong interaction between factors influencing consonant duration , 1997, EUROSPEECH.

[26]  Stephen Isard,et al.  Segment durations in a syllable frame , 1991 .