MultiMediate: Multi-modal Group Behaviour Analysis for Artificial Mediation

Artificial mediators are promising to support human group conversations but at present their abilities are limited by insufficient progress in group behaviour analysis. The MultiMediate challenge addresses, for the first time, two fundamental group behaviour analysis tasks in well-defined conditions: eye contact detection and next speaker prediction. For training and evaluation, MultiMediate makes use of the MPIIGroup Interaction dataset consisting of 22 three- to four-person discussions as well as of an unpublished test set of six additional discussions. This paper describes the MultiMediate challenge and presents the challenge dataset including novel fine-grained speaking annotations that were collected for the purpose of MultiMediate. Furthermore, we present baseline approaches and ablation studies for both challenge tasks

[1]  Gabriel Skantze,et al.  Turn-taking in Conversational Systems and Human-Robot Interaction: A Review , 2021, Comput. Speech Lang..

[2]  Julien Saunier,et al.  Who Speaks Next? Turn Change and Next Speaker Prediction in Multimodal Multiparty Interaction , 2020, 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI).

[3]  Gabriel Skantze,et al.  TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog , 2020, FINDINGS.

[4]  Olov Engwall,et al.  Interaction and collaboration in robot-assisted language learning for adults , 2020, Computer Assisted Language Learning.

[5]  Youn-kyung Lim,et al.  Investigating User Expectations on the Roles of Family-shared AI Speakers , 2020, CHI.

[6]  M. Matarić,et al.  Can I Trust You? A User Study of Robot Mediation of a Support Group , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Malte F. Jung,et al.  Robots in Groups and Teams , 2020, Proc. ACM Hum. Comput. Interact..

[8]  Kazuhiro Otsuka,et al.  Prediction of Who Will Be Next Speaker and When Using Mouth-Opening Pattern in Multi-Party Conversation , 2019, Multimodal Technol. Interact..

[9]  Indrani Bhattacharya,et al.  Improved Visual Focus of Attention Estimation and Prosodic Features for Analyzing Group Interactions , 2019, ICMI.

[10]  Tsuneo Kato,et al.  Floor Apportionment Function of Speaker's Gaze in Grounding Acts , 2019, ICMI.

[11]  Cristina Becchio,et al.  Tracking the Leader: Gaze Behavior in Group Interactions , 2019, iScience.

[12]  Andreas Bulling,et al.  Emergent Leadership Detection Across Datasets , 2019, ICMI.

[13]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kazuhiro Otsuka,et al.  Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks , 2018, ICMI.

[15]  Gabriel Skantze,et al.  Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs , 2018, ICMI.

[16]  Xucong Zhang,et al.  Robust eye contact detection in natural multi-person interactions using gaze and speaking behaviour , 2018, ETRA.

[17]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[18]  M. Pickering,et al.  Coordinating Utterances During Turn-Taking: The Role of Prediction, Response Preparation, and Articulation , 2018 .

[19]  Andreas Bulling,et al.  Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour , 2018, IUI.

[20]  Vittorio Murino,et al.  Prediction of the Leadership Style of an Emergent Leader Using Audio and Visual Nonverbal Features , 2018, IEEE Transactions on Multimedia.

[21]  Yusuke Sugano,et al.  Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery , 2017, UIST.

[22]  Gabriel Skantze,et al.  A First Visit to the Robot Language Café , 2017, SLaTE.

[23]  Gabriel Skantze,et al.  Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks , 2017, SIGDIAL Conference.

[24]  Maja J. Mataric,et al.  Robot moderation of a collaborative game: Towards socially assistive robotics in group interactions , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[25]  S. Levinson,et al.  Next Speakers Plan Their Turn Early and Speak after Turn-Final “Go-Signals” , 2017, Front. Psychol..

[26]  Junji Yamato,et al.  Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings , 2016, ACM Trans. Interact. Intell. Syst..

[27]  Kazuhiro Otsuka,et al.  Predicting next speaker based on head movement in multi-party meetings , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Massimo Zancanaro,et al.  Overt or subtle? Supporting group conversations with automatically targeted directives , 2014, IUI.

[29]  Junji Yamato,et al.  Predicting next speaker and timing from gaze transition patterns in multi-party meetings , 2013, ICMI '13.

[30]  Masafumi Nishida,et al.  Gaze and turn-taking behavior in casual conversational interactions , 2013, TIIS.

[31]  U. Hess,et al.  Emotional Mimicry as Social Regulation , 2013, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[32]  Fred Cummins Gaze and blinking in dyadic conversation: A study in coordinated behaviour among individuals , 2012 .

[33]  Tatsuya Kawahara,et al.  Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations , 2012, INTERSPEECH.

[34]  Judith Good,et al.  Enhancing interactional synchrony with an ambient display , 2011, CHI.

[35]  Jean-Marc Odobez,et al.  Multiperson Visual Focus of Attention from Head Pose and Meeting Contextual Cues , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Eric Horvitz,et al.  Facilitating multiparty dialog with gaze, gesture, and speech , 2010, ICMI-MLMI '10.

[37]  P. Kay,et al.  Universals and cultural variation in turn-taking in conversation , 2009, Proceedings of the National Academy of Sciences.

[38]  Harry Bunt,et al.  'Who's next? Speaker-selection mechanisms in multiparty dialogue' , 2009 .

[39]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[40]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[41]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[42]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.