CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language

In this paper, we describe the principle and functionality of the Computer-Assisted Stress Teaching and Learning Environment (CASTLE) that we have proposed and developed to help learners of English as a Second Language (ESL) to learn stress patterns of English language. There are three modules in the CASTLE system. The first module, individualised speech learning material providing module, can provide learners individualised speech material that possesses their preferred voice features, e.g., gender, pitch and speech rate. The second module, perception assistance module, is intended to help learners correctly perceive English stress patterns, which can automatically exaggerate the differences between stressed and unstressed syllables in a teacher’s voice. The third module, production assistance module, is developed to help learners to be aware of the rhythm of English language and provide them feedback in order to improve their production of stress patterns.

[1]  Maxine Eskénazi,et al.  Enhancing foreign language tutors - In search of the golden speaker , 2002, Speech Commun..

[2]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[3]  Maxine Eskenazi,et al.  The Fluency Pronunciation Trainer: Update and user issues , 2000 .

[4]  I.S.P. Nation,et al.  Teaching ESL/EFL Listening and Speaking , 2008 .

[5]  Maria Paola Bissiri,et al.  Italian speakers learn lexical stress of German morphologically complex words , 2009, Speech Commun..

[6]  John Field Intelligibility and the Listener: The Role of Lexical Stress , 2005 .

[7]  Yanren Ding,et al.  Text memorization and imitation: The practices of successful Chinese learners of English , 2007 .

[8]  Rebecca Hincks Speech synthesis for teaching lexical stress , 2002 .

[9]  Julia Hirschberg,et al.  Detecting Pitch Accents at the Word, Syllable and Vowel Level , 2009, NAACL.

[10]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[11]  V. V. van Heuven,et al.  Spectral balance as a cue in the perception of linguistic stress. , 1997, The Journal of the Acoustical Society of America.

[12]  Yang Gao,et al.  Syllable nucleus Durations Estimation using Linear Regression based ensemble model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Mildred K. Arnett Does the elementary teacher have time to teach speech , 1952 .

[14]  C. Gussenhoven,et al.  A typological study of stress ‘deafness’ , 2002 .

[15]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[16]  Shrikanth S. Narayanan,et al.  Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  John-Paul Hosom Automatic phoneme alignment based on acoustic-phonetic modeling , 2002, INTERSPEECH.

[18]  Daniel Erro,et al.  Weighted frequency warping for voice conversion , 2007, INTERSPEECH.