Feedback utterances for computer-adied language learning using accent reduction and voice conversion method

This paper considers the generation of feedback utterances for speaking skills training of non-native English learners. The proposed feedback is in the form of a combination of the learner's voice and the linguistic gestures, i.e., the prosody or pronunciation, of a native speaker. Both accent reduction method and voice conversion method are employed to generate feedback stimuli. For accent reduction, three speech synthesis methods, namely pitch-synchronous overlap and add (PSOLA), harmonic stochastic model (HSM), and speech transformation and representation by adaptive interpolation of weighted spectrogram (STRAIGHT) are used to reduce the accent of the utterances of English learners. For voice conversion, the teacher's voice is converted to that of the learner and the converted speech is used as a feedback. Objective measurements are employed to assess the nativeness and acoustic quality of the generated stimuli. A feedback scheme which combines the accent reduction and voice conversion methods is also proposed.

[1]  Kazunori Ozawa,et al.  English speech training using voice conversion , 1990, ICSLP.

[2]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[3]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Qin Yan,et al.  Modeling and synthesis of English regional accents with pitch and duration correlates , 2010, Comput. Speech Lang..

[5]  Ricardo Gutierrez-Osuna,et al.  Developing Objective Measures of Foreign-Accent Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Charles S. Watson,et al.  Advances in Computer-Based Speech Training: Aids for the Profoundly Hearing Impaired. , 1989 .

[7]  Steve Young,et al.  The HTK book , 1995 .

[8]  Ricardo Gutierrez-Osuna,et al.  Foreign accent conversion in computer assisted pronunciation training , 2009, Speech Commun..

[9]  Maxine Eskénazi,et al.  Enhancing foreign language tutors - In search of the golden speaker , 2002, Speech Commun..

[10]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[11]  Keith Vertanen Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .

[12]  Mark Huckvale,et al.  Spoken language conversion with accent morphing , 2007, SSW.

[13]  Hideki Kawahara,et al.  Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation , 2007 .

[14]  Maria Paola Bissiri,et al.  Lexical Stress Training of German Compounds for Italian Speakers by means of Resynthesis and Emphasis , 2022 .

[15]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[16]  Gregor Möhler,et al.  Intonational Foreign Accent : Speech Technology and Foreign Language Teaching , 1998 .

[17]  Daniel Erro Eslava Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models , 2008 .