Prosodic Facilitation and Interference While Judging on the Veracity of Synthesized Statements

Two primary sources of information are provided in human speech. On the one hand, the verbal channel encodes linguistic content, while on the other hand, the vocal channel transmits paralinguistic information, mainly through prosody. In line with several studies that induce a conflict between these two channels to better understand the role of prosody, we conducted an experiment in which subjects had to listen to a series of statements synthesized with varying prosody and indicate if they believed them to be true or false. We find evidence suggesting that acoustic/prosodic (a/p) features of the synthesized statements affect response times (a well-known proxy for cognitive load). Our results suggest that prosody in synthesized speech may play a role of either facilitation or interference when subjects judge the truthfulness of a statement. Furthermore, we find that this pattern is amplified when the a/p features of the synthesized statements are analyzed relative to the subjects’ own a/p features. This suggests that the entrainment of TTS voices has serious implications in the perceived trustworthiness of the system’s skills.

[1]  Julia Hirschberg,et al.  Entrainment and Turn-Taking in Human-Human Dialogue , 2015, AAAI Spring Symposia.

[2]  Julia Hirschberg,et al.  Backward mimicry and forward influence in prosodic contour choice in standard American English , 2015, INTERSPEECH.

[3]  Julia Hirschberg,et al.  The Pragmatics of Intonational Meaning , 2002 .

[4]  James J. Lindsay,et al.  Cues to deception. , 2003, Psychological bulletin.

[5]  Arthur Ward Measuring Convergence and Priming in Tutorial Dialog , 2007 .

[6]  D. Mewhort,et al.  Analysis of Response Time Distributions: An Example Using the Stroop Task , 1991 .

[7]  L. Streeter,et al.  Effects of Pitch and Speech Rate on Personal Attributions , 1979 .

[8]  M. Kjelgaard,et al.  Prosodic Facilitation and Interference in the Resolution of Temporary Syntactic Closure Ambiguity , 1999 .

[9]  R. Mitchell Does incongruence of lexicosemantic and prosodic information cause discernible cognitive conflict? , 2006, Cognitive, affective & behavioral neuroscience.

[10]  H. H. Clark,et al.  Conceptual pacts and lexical choice in conversation. , 1996, Journal of experimental psychology. Learning, memory, and cognition.

[11]  Stephen M. Smith,et al.  Celerity and Cajolery: Rapid Speech May Promote or Inhibit Persuasion through its Impact on Message Elaboration , 1991 .

[12]  Casey A. Klofstad,et al.  Sounds like a winner: voice pitch influences perception of leadership capacity in both men and women , 2011, Proceedings of the Royal Society B: Biological Sciences.

[13]  Cindy L. Bethel,et al.  A Survey of Using Vocal Prosody to Convey Emotion in Robot Speech , 2016, Int. J. Soc. Robotics.

[14]  Julia Hirschberg,et al.  Acoustic-prosodic entrainment in Slovak, Spanish, English and Chinese: A cross-linguistic comparison , 2015, SIGDIAL Conference.

[15]  Catherine J. Stevens,et al.  Synthesized speech intelligibility and persuasion: Speech rate and non-native listeners , 2007, Comput. Speech Lang..

[16]  G. Fairbanks,et al.  An experimental study of the pitch characteristics of the voice during the expression of emotion , 1939 .

[17]  Judee K. Burgoon,et al.  The voice of deceit: Refining and expanding vocal cues to deception , 1997 .

[18]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[19]  Joey T. Cheng,et al.  Listen, follow me: Dynamic vocal signals of dominance predict emergent social rank in humans. , 2016, Journal of experimental psychology. General.

[20]  Julia Hirschberg,et al.  Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions , 2011, INTERSPEECH.

[21]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[22]  Juan Manuel Pérez,et al.  Disentrainment may be a Positive Thing: A Novel Measure of Unsigned Acoustic-Prosodic Synchrony, and its Relation to Speaker Engagement , 2016, INTERSPEECH.

[23]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[24]  S. Kotz,et al.  On emotional conflict: interference resolution of happy and angry prosody reveals valence-specific effects. , 2010, Cerebral cortex.

[25]  Carlos Gussenhoven,et al.  Intonation and interpretation: phonetics and phonology , 2002, Speech Prosody 2002.

[26]  Clifford Nass,et al.  Improving automotive safety by pairing driver emotion and car voice emotion , 2005, CHI Extended Abstracts.

[27]  K. Burns,et al.  Significance of vocal and visual channels in the decoding of emotional meaning. , 1973, The Journal of communication.

[28]  Julia Hirschberg,et al.  Implementing Acoustic-Prosodic Entrainment in a Conversational Avatar , 2016, INTERSPEECH.

[29]  Agustín Gravano,et al.  Improving speech synthesis quality by reducing pitch peaks in the source recordings , 2013, NAACL.

[30]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[31]  Arthur C. Graesser,et al.  AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back , 2012, TIIS.

[32]  J. Ohala Cross-Language Use of Pitch: An Ethological View , 1983, Phonetica.