A multimodal analysis of synchrony during dyadic interaction using a metric based on sequential pattern mining

In human-human interaction, people tend to adapt to each other as the conversation progresses, mirroring their intonation, speech rate, fundamental frequency, word selection, hand gestures, and head movements. This phenomenon is known as synchrony, convergence, entrainment, and adaptation. Recent studies have investigated this phenomenon at different dimensions and levels for single modalities. However, the interplay between modalities at a local level to study synchrony between conversational partners is an open question. This paper studies synchrony using a multimodal approach based on sequential pattern mining in dyadic conversations. This analysis deals with both acoustic and text-based features at a local level. The proposed data-driven framework identifies frequent sequences containing events from multiple modalities that can quantify the synchrony between conversational partners (e.g., a speaker reduces speech rate when the other utters disfluencies). The evaluation relies on 90 sessions from the Fishers corpus, which comprises telephone conversations between two people. We develop a multimodal metric to quantify synchrony between conversational partners using this framework. We report initial results on this metric by comparing actual dyadic conversations with sessions artificially created by randomly pairing the speakers.

[1]  Mohamed Chetouani,et al.  Multimodal coordination: exploring relevant features and measures , 2010, SSPW '10.

[2]  H. Giles,et al.  Contexts of Accommodation: Developments in Applied Sociolinguistics , 2010 .

[3]  Georgios N. Yannakakis,et al.  Mining multimodal sequential patterns: a case study on affect detection , 2011, ICMI '11.

[4]  Philippe Gaussier,et al.  How an Agent Can Detect and Use Synchrony Parameter of Its Own Interaction with a Human? , 2009, COST 2102 Training School.

[5]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[6]  J. Cohn,et al.  Dyadic Behavior Analysis in Depression Severity Assessment Interviews , 2014, ICMI.

[7]  L. Wylie Language Learning and Communication. , 1985 .

[8]  Julia Hirschberg,et al.  Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions , 2011, INTERSPEECH.

[9]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[10]  Linda Tickle-Degnen,et al.  Group rapport and nonverbal behavior. , 1987 .

[11]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[12]  Mohamed Chetouani,et al.  Interpersonal Synchrony: A Survey of Evaluation Methods across Disciplines , 2012, IEEE Transactions on Affective Computing.

[13]  Mattias Heldner,et al.  Prosodic adaptation in human-computer interaction , 2003 .

[14]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[15]  Julia Hirschberg,et al.  The Prosody of Backchannels in American English , 2007 .

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Justine Cassell,et al.  Human conversation as a system framework: designing embodied conversational agents , 2001 .

[18]  Frank J. Bernieri,et al.  Synchrony, pseudosynchrony, and dissynchrony: Measuring the entrainment process in mother-infant interactions. , 1988 .

[19]  Fabian Ramseyer,et al.  Nonverbal Synchrony or Random Coincidence? How to Tell the Difference , 2009, COST 2102 Training School.

[20]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[21]  K. Shockley,et al.  Mutual interpersonal postural constraints are involved in cooperative conversation. , 2003, Journal of experimental psychology. Human perception and performance.

[22]  Ken Prepin,et al.  Human–machine interaction as a model of machine–machine interaction: how to make machines interact as humans do , 2007, Adv. Robotics.

[23]  Anton Nijholt,et al.  The MAHNOB Mimicry Database: A database of naturalistic human interactions , 2015, Pattern Recognit. Lett..

[24]  Nick Campbell,et al.  Investigating automatic measurements of prosodic accommodation and its dynamics in social interaction , 2014, Speech Commun..

[25]  A. Kendon Movement coordination in social interaction: some examples described. , 1970, Acta psychologica.

[26]  Panayiotis G. Georgiou,et al.  Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counseling , 2015, INTERSPEECH.

[27]  Carlos Busso,et al.  Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition , 2013, IEEE Transactions on Affective Computing.

[28]  Carlos Busso,et al.  Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions , 2009, INTERSPEECH.

[29]  Mattias Heldner,et al.  Pitch similarity in the vicinity of backchannels , 2010, INTERSPEECH.

[30]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[31]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[32]  Ning Wang,et al.  Creating Rapport with Virtual Agents , 2007, IVA.

[33]  Athanasios Katsamanis,et al.  Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples , 2010, INTERSPEECH.

[34]  T. Brazelton,et al.  Mutuality in mother-infant interaction. , 1977, The Journal of communication.

[35]  Julia Hirschberg,et al.  Entrainment in Speech Preceding Backchannels. , 2011, ACL.

[36]  Julia Hirschberg,et al.  High Frequency Word Entrainment in Spoken Dialogue , 2008, ACL.

[37]  Frank J. Bernieri,et al.  Interpersonal coordination: Behavior matching and interactional synchrony. , 1991 .

[38]  Andrew Rosenberg,et al.  AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.

[39]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  S. Boker,et al.  Windowed cross-correlation and peak picking for the analysis of variability in the association between behavioral time series. , 2002, Psychological methods.

[41]  Daniel C. Richardson,et al.  Looking To Understand: The Coupling Between Speakers' and Listeners' Eye Movements and Its Relationship to Discourse Comprehension , 2005, Cogn. Sci..

[42]  Julia Hirschberg,et al.  Backward mimicry and forward influence in prosodic contour choice in standard American English , 2015, INTERSPEECH.