Toward Multi-Features Emphasis Speech Translation: Assessment of Human Emphasis Production and Perception with Speech and Text Clues

Emphasis is an important factor of human speech that helps convey emotion and the focused information of utterances. Recently, studies have been conducted on speech-to-speech translation to preserve the emphasis information from the source language to the target language. However, since different cultures have various ways of expressing emphasis, just considering the acoustic-to-acoustic feature emphasis translation may not always reflect the experiences of users. On the other hand, emphasis can be expressed at various levels in both text and speech. However, it remains unclear how we communicate emphasis in a different form (acoustic/linguistic) with different levels and whether we can perceive the difference between different levels of emphasis or observe the similarity of the same emphasis levels in both text and speech forms. In this paper, we conducted analyses on human perception of emphasis with both speech and text clues through crowd-sourced evaluations. The results indicate that although participants can distinguish among emphasis levels and perceive the same emphasis level between speech and text, many ambiguities still exist at certain emphasis levels. Thus, our result provides insight into what needs to be handled during the emphasis translation process.

[1]  Alan W. Black,et al.  Intent transfer in speech-to-speech machine translation , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[2]  Satoshi Nakamura,et al.  Sequence-to-Sequence Models for Emphasis Speech Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Jordi Adell,et al.  Prosody Generation for Speech-to-Speech Translation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Petra Wagner,et al.  Comparing Word and Syllable Prominence Rated by Naïve Listeners , 2011, INTERSPEECH.

[5]  Kai Yu,et al.  Word-level emphasis modelling in HMM-based speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Tomoki Toda,et al.  Emphasized speech synthesis based on hidden Markov models , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[7]  Yujie Su Corpus-based comparative study of intensifiers: quite, pretty, rather and fairly , 2016 .

[8]  A. Athanasiadou On the subjectivity of intensifiers , 2007 .

[9]  Daniel P. W. Ellis,et al.  Pitch-based emphasis detection for characterization of meeting recordings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10]  John M. Kirk Corpora galore : analyses and techniques in describing English : papers from the nineteenth International Conference on English Language Research on Computerised Corpora (ICAME 1998) , 2000 .

[11]  Carlos Busso,et al.  The expression and perception of emotions: comparing assessments of self versus others , 2008, INTERSPEECH.

[12]  Tomoki Toda,et al.  Improving translation of emphasis with pause prediction in speech-to-speech translation systems , 2015, IWSLT.

[13]  E. S. C. Weiner,et al.  Oxford dictionary of English grammar , 2014 .

[14]  Aleksandra Cwiek,et al.  The Acoustic Realization of Prosodic Prominence in Polish: Word-level Stress and Phrase-level Accent , 2018, Speech Prosody 2018.

[15]  Tomoki Toda,et al.  A method for translation of paralinguistic information , 2012, IWSLT.

[16]  Noah D. Goodman,et al.  Extremely costly intensifiers are stronger than quite costly ones , 2018, Cognition.

[17]  Tomoki Toda,et al.  Preserving Word-Level Emphasis in Speech-to-Speech Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Sali A. Tagliamonte,et al.  Well weird, right dodgy, very strange, really cool: Layering and recycling in English intensifiers , 2003, Language in Society.

[19]  Tim Polzehl,et al.  Automatically Assessing Personality from Speech , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[20]  Tomoki Toda,et al.  Collection and analysis of a Japanese-English emphasized speech corpora , 2014, 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA).