Predicting next speaker and timing from gaze transition patterns in multi-party meetings

In multi-party meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, and to consider a strategy for good timing to speak next. Gaze behavior plays an important role for smooth turn-taking. This paper proposes a mathematical prediction model that features three processing steps to predict (I) whether turn-taking or turn-keeping will occur, (II) who will be the next speaker in turn-taking, and (III) the timing of the start of the next speaker's utterance. For the feature quantity of the model, we focused on gaze transition patterns near the end of utterance. We collected corpus data of multi party meetings and analyzed how the frequencies of appearance of gaze transition patterns differs depending on situations of (I), (II), and (III). On the basis of the analysis, we construct a probabilistic mathematical model that uses the frequencies of appearance of all participants' gaze transition patterns. The results of an evaluation of the model show the proposed models succeed with high precision compared to ones that do not take gaze transition patterns into account.

[1]  Masafumi Nishida,et al.  Turn-alignment using eye-gaze and speech in conversational interaction , 2010, INTERSPEECH.

[2]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[3]  S. Haberman The Analysis of Residuals in Cross-Classified Tables , 1973 .

[4]  Tatsuya Kawahara,et al.  Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations , 2012, INTERSPEECH.

[5]  A. Ichikawa,et al.  An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs , 1998, Language and speech.

[6]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[7]  Dirk Heylen,et al.  Multimodal end-of-turn prediction in multi-party meetings , 2009, ICMI-MLMI '09.

[8]  Anton Nijholt,et al.  Addressee Identification in Face-to-Face Meetings , 2006, EACL.

[9]  Mary P. Harper,et al.  Multimodal floor control shift detection , 2009, ICMI-MLMI '09.

[10]  David Schlangen,et al.  From reaction to prediction: experiments with computational models of turn-taking , 2006, INTERSPEECH.

[11]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[12]  Gina-Anne Levow,et al.  Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation , 2005, IJCNLP.

[13]  Andreas Stolcke,et al.  Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody , 2002, INTERSPEECH.

[14]  Hervé Bourlard,et al.  Floor holder detection and end of speaker turn prediction in meetings , 2010, INTERSPEECH.

[15]  Daniel Gatica-Perez,et al.  Analyzing Group Interactions in Conversations: a Review , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[16]  Mattias Heldner,et al.  A single-port non-parametric model of turn-taking in multi-party conversation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Louis-Philippe Morency,et al.  Predicting Listener Backchannels: A Probabilistic Multimodal Approach , 2008, IVA.

[18]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[19]  Louis-Philippe Morency,et al.  A multimodal end-of-turn prediction model: learning from parasocial consensus sampling , 2011, AAMAS.