A study on the effect of prosodic emphasis transfer on overall speech translation quality

Despite the increasing interest in Speech-to-speech (S2S) translation, research and development has focused almost exclusively on the lexical aspects of translation. The importance of transferring prosodic and other paralinguistic information through S2S devices and evaluating its impact on the translation quality are yet to be well established. The novelty in this work is a large scale human evaluation study to test the hypothesis that cross-lingual prosodic emphasis transfer is directly related to the perceived quality of speech translation. This hypothesis is validated at the 0.53-0.54 correlation level on the data sets considered with results significant at p-value=0.01. The second contribution of this work is an evaluation methodology based on crowd sourcing using English-Spanish language bilingual data from two distinct domains and evaluated with over 200 bilingual speakers. We also present lessons learned on this type of S2S subjective experiments when using crowd sourcing.

[1]  Shrikanth S. Narayanan,et al.  Enriching machine-mediated speech-to-speech translation using contextual information , 2013, Comput. Speech Lang..

[2]  Stephan Vogel,et al.  Improving speech synthesis of machine translation output , 2010, INTERSPEECH.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Adrienne Y. Stith,et al.  Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care , 2005 .

[5]  Panayiotis G. Georgiou,et al.  Creation of a Doctor-Patient Dialogue Corpus Using Standardized Patients , 2004, LREC.

[6]  David Crystal,et al.  Prosodic Systems and Intonation in English , 1969 .

[7]  Daniel Marcu,et al.  Transonics: a speech to speech system for English-Persian interactions , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[8]  Steven H. Weinberger,et al.  The Wisdom of the Crowd’s Ear: Speech Accent Rating and Annotation with Amazon Mechanical Turk , 2010, Mturk@HLT-NAACL.

[9]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[10]  R. Collier Prosodic Systems and Intonation in English , 1969 .

[11]  Bowen Zhou,et al.  IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-Speech Translator , 2006 .

[12]  References , 1971 .

[13]  Kristin Precoda,et al.  Implementing SRI's Pashto speech-to-speech translation system on a smart phone , 2010, 2010 IEEE Spoken Language Technology Workshop.

[14]  Nick Campbell,et al.  On the Use of NonVerbal Speech Sounds in Human Communication , 2007, COST 2102 Workshop.

[15]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[16]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[17]  Jordi Adell,et al.  Prosody Generation for Speech-to-Speech Translation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  Panayiotis G. Georgiou,et al.  Bilingual audio-subtitle extraction using automatic segmentation of movie audio , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[20]  David Brazil,et al.  Discourse, Intonation and Language Teaching , 1981 .

[21]  Manny Rayner,et al.  Proceedings of the Workshop on Medical Speech Translation , 2006 .

[22]  R. Prasad,et al.  Real-Time Speech-to-Speech Translation for PDAs , 2007, 2007 IEEE International Conference on Portable Information Devices.

[23]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24]  Hervé Bourlard,et al.  On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .