Synthetic speech in foreign language learning: an evaluation by learners

Can synthetic speech be utilized in foreign language learning as natural speech? In this paper, we evaluated synthetic speech from the viewpoint of learners in order to find out an answer. The results pointed out that learners do not recognize remarkable differences between synthetic voices and natural voices for the words with short vowels and long vowels when they try to understand the meanings of the sounds. The data explicates that synthetic voice utterances of sentences are easier to understand and more acceptable by learners compared to synthetic voice utterances of words. In addition, the ratings on both synthetic voices and natural voices strongly depend upon the learners’ listening comprehension abilities. We conclude that some synthetic speech with specific pronunciations of vowels may be suitable for listening materials and suggest that evaluating TTS systems by comparing synthetic speech with natural speech and building a lexical database of synthetic speech that closely approximates natural speech will be helpful for teachers to readily use many existing CALL tools.

[1]  J. Rubin A Review of Second Language Listening Comprehension Research , 1994 .

[2]  A. Macallum The University of Toronto , 1907, Nature.

[3]  Donna M. Brinton,et al.  Teaching Pronunciation: A Reference for Teachers of English to Speakers of Other Languages , 1996 .

[4]  Yu-Chih Sun,et al.  VOICE BLOG: AN EXPLORATORY STUDY OF LANGUAGE LEARNING , 2009 .

[5]  J. Flege Second Language Speech Learning Theory , Findings , and Problems , 2006 .

[6]  Mervyn Jack,et al.  Scenario-Based Spoken Interaction with Virtual Agents , 2005 .

[7]  Andrea Dlaska Sites of construction: language learning, multimedia, and the international engineer , 2002, Comput. Educ..

[8]  Min Kang,et al.  Effects of Various Communication Tools on Foreign Language Learners , 2006, 2006 7th International Conference on Information Technology Based Higher Education and Training.

[9]  D. Pisoni,et al.  Training Japanese listeners to identify English /r/ and /l/: a first report. , 1991, The Journal of the Acoustical Society of America.

[10]  Louis C. W. Pols,et al.  Evaluating text-to-speech systems: Some methodological aspects , 1990, Speech Commun..

[11]  G. Hudson Essential Introductory Linguistics , 1999 .

[12]  Marie-Josée Hamel,et al.  Establishing a Methodology for Benchmarking Speech Synthesis for Computer-Assisted Language Learning (CALL). , 2005 .

[13]  Reiko Akahane-Yamada,et al.  Learning non‐native speech contrasts: What laboratory training studies tells us , 1996 .

[14]  Hossein Nassaji,et al.  The Relationship between Depth of Vocabulary Knowledge and L2 Learners' Lexical Inferencing Strategy Use and Success , 2004 .

[15]  Richard C. Dorf,et al.  The Electrical Engineering Handbook , 1993 .

[16]  W. Strange,et al.  Effects of discrimination training on the perception of /r-l/ by Japanese adults learning English , 1984, Perception & psychophysics.

[17]  J. Flege,et al.  An investigation of current models of second language speech perception: the case of Japanese adults' perception of English consonants. , 2000, The Journal of the Acoustical Society of America.

[18]  Bob Godwin-Jones EMERGING TECHNOLOGIES Language Testing Tools and Technologies , 2001 .

[19]  See-Gyoon Park,et al.  Cross-language vowel perception and production by Japanese and Korean learners of English , 1997 .

[20]  W. Strange Speech perception and linguistic experience : issues in cross-language research , 1995 .

[21]  Debra Hoven,et al.  A model for listening and viewing comprehension in multimedia environments , 1999 .

[22]  Kiichi Matsuhata,et al.  Relationships of L2 Listening Ability to Oral Reading Rate and Comprehension , 2002 .

[23]  Magnus Wilson Discovery listening—improving perceptual processing , 2003 .

[24]  Murray J. Munro,et al.  Computer-Based Training for Learning English Vowel Contrasts. , 2004 .

[25]  John L. Arnott,et al.  Emotional stress in synthetic speech: Progress and future directions , 1996, Speech Commun..

[26]  Arthur C. Graesser,et al.  A framework of synthesizing tutoring conversation capability with web-based distance education courseware , 2004, Comput. Educ..

[27]  Y. Yamamoto,et al.  Enabling a cross-cultural collaborative community: networking technologies to form meaningful environments for higher education , 2004, Information Technology Based Proceedings of the FIfth International Conference onHigher Education and Training, 2004. ITHET 2004..

[28]  Faramarz Amiri IT-Literacy for Language Teachers: Should It Include Computer Programming?. , 2000 .

[29]  Cristina Delogu,et al.  Cognitive factors in the evaluation of synthetic speech , 1998, Speech Commun..

[30]  Volker Hegelheimer,et al.  Using CALL in the classroom: Analyzing student interactions in an authentic classroom , 2004 .

[31]  Robert I. Damper,et al.  Evaluating the pronunciation component of text-to-speech systems for English: a performance comparison of different approaches , 1999, Comput. Speech Lang..

[32]  Ilya Yaroslavsky,et al.  Persuasion and social perception of human vs. synthetic voice across person as source and computer as source conditions , 2006, Int. J. Hum. Comput. Stud..

[33]  D. Jamieson,et al.  Training non-native speech contrasts in adults: Acquisition of the English /ð/-/θ/ contrast by francophones , 1986 .