Implementing Acoustic-Prosodic Entrainment in a Conversational Avatar

Entrainment, aka accommodation or alignment, is the phenomenon by which conversational partners become more similar to each other in behavior. While there has been much work on some behaviors there has been little on entrainment in speech and even less on how Spoken Dialogue Systems (SDS) which entrain to their users’ speech can be created. We present an architecture and algorithm for implementing acoustic-prosodic entrainment in SDS and show that speech produced under this algorithm conforms to the feature targets, satisfying the properties of entrainment behavior observed in human-human conversations. We present results of an extrinsic evaluation of this method, comparing whether subjects are more likely to ask advice from a conversational avatar that entrains vs. one that does not, in English, Spanish and Slovak SDS.

[1]  Maxine Eskénazi,et al.  Automated two-way entrainment to improve spoken dialog system performance , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Julia Hirschberg,et al.  Acoustic-Prosodic Entrainment and Social Behavior , 2012, NAACL.

[3]  T. Chartrand,et al.  The chameleon effect: the perception-behavior link and social interaction. , 1999, Journal of personality and social psychology.

[4]  G. Bryant,et al.  Convergence of speech rate in conversation predicts cooperation , 2013 .

[5]  S. W. Gregory,et al.  Voice pitch and amplitude convergence as a metric of quality in dyadic interviews , 1993 .

[6]  Lauren E. Scissors,et al.  Language Style Matching Predicts Relationship Initiation and Stability , 2011, Psychological science.

[7]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Jesse Thomason,et al.  Prosodic Entrainment and Tutoring Dialogue Success , 2013, AIED.

[9]  Johanna D. Moore,et al.  Predicting Success in Dialogue , 2007, ACL.

[10]  Andrew Rosenberg,et al.  AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.

[11]  Agustín Gravano,et al.  Improving speech synthesis quality by reducing pitch peaks in the source recordings , 2013, NAACL.

[12]  Johanna D. Moore,et al.  Priming of Syntactic Rules in Task-Oriented Dialogue and Spontaneous Conversation , 2006 .

[13]  Arthur Ward Measuring Convergence and Priming in Tutorial Dialog , 2007 .

[14]  R. Street Speech Convergence and Speech Evaluation in Fact-Finding Interviews , 1984 .

[15]  Julia Hirschberg,et al.  Acoustic-prosodic entrainment in Slovak, Spanish, English and Chinese: A cross-linguistic comparison , 2015, SIGDIAL Conference.

[16]  Marilyn A. Walker,et al.  Entrainment in Pedestrian Direction Giving: How Many Kinds of Entrainment? , 2014, IWSDS.

[17]  Susan E. Brennan,et al.  LEXICAL ENTRAINMENT IN SPONTANEOUS DIALOG , 1996 .

[18]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[19]  M. Natale CONVERGENCE OF MEAN VOCAL INTENSITY IN DYADIC COMMUNICATION AS A FUNCTION OF SOCIAL DESIRABILITY , 1975 .

[20]  Rivka Levitan,et al.  Acoustic-Prosodic Entrainment in Human-Human and Human-Computer Dialogue , 2014 .

[21]  Loizos Michael,et al.  Write Like I Write: Herding in the Language of Online Reviews , 2014, ICWSM.

[22]  J. Pennebaker,et al.  Linguistic Style Matching in Social Interaction , 2002 .

[23]  H. Giles,et al.  Accommodation theory: Communication, context, and consequence. , 1991 .

[24]  Athanasios Katsamanis,et al.  Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples , 2010, INTERSPEECH.

[25]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[26]  Alexandra A. Cleland,et al.  Syntactic co-ordination in dialogue , 2000, Cognition.

[27]  Clifford Nass,et al.  Computers are social actors , 1994, CHI '94.

[28]  Julia Hirschberg,et al.  Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions , 2011, INTERSPEECH.

[29]  Julia Hirschberg,et al.  Entrainment and Turn-Taking in Human-Human Dialogue , 2015, AAAI Spring Symposia.

[30]  Chung-Hsien Wu,et al.  Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Susan T. Dumais,et al.  Mark my words!: linguistic style accommodation in social media , 2011, WWW.

[32]  H. Giles,et al.  Contexts of Accommodation: Developments in Applied Sociolinguistics , 2010 .

[33]  Erin Walker,et al.  Naturalness and rapport in a pitch adaptive learning companion , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[34]  S. Gosling,et al.  A very brief measure of the Big-Five personality domains , 2003 .

[35]  Piet Mertens,et al.  The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model , 2004 .