Inferring state transition from bystander to participant in free-style conversational interaction

We propose a novel method for inferring the state transition from bystander to participant in free-style conversational interactions, using physical behaviors acquired from cameras and a microphone. Although existing methods address participants and a presenter, these methods do not consider bystanders, who play an important role in the interaction. In the research field of cognitive science, the existing model considers psychological aspects of changing from bystander to participant. However, this model is difficult to implement because inferring the psychological aspects of bystanders is a challenging task. Instead of using psychological aspects, our method exploits physical behaviors such as standing position, facial direction, and voice direction. We analyzed the suitable parameters of the behavior to increase performance in inferring state transitions, using datasets collected from poster presentations.

[1]  Marcello Pelillo,et al.  Detecting conversational groups in images and sequences: A robust game-theoretic approach , 2016, Comput. Vis. Image Underst..

[2]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[3]  A. Kendon Conducting Interaction: Patterns of Behavior in Focused Encounters , 1990 .

[4]  Mary P. Harper,et al.  VACE Multimodal Meeting Corpus , 2005, MLMI.

[5]  E. Goffman,et al.  Forms of talk , 1982 .

[6]  Fabio Pianesi,et al.  A multimodal annotated corpus of consensus decision making meetings , 2007, Lang. Resour. Evaluation.

[7]  Ben J. A. Kröse,et al.  Detecting F-formations as dominant sets , 2011, ICMI '11.

[8]  Toyoaki Nishida,et al.  Analysis environment of conversational structure with nonverbal multimodal data , 2010, ICMI-MLMI '10.

[9]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Subramanian Ramanathan,et al.  Uncovering Interactions and Interactors: Joint Estimation of Head, Body Orientation and F-Formations from Surveillance Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Tatsuya Kawahara,et al.  Multiparty Interaction Understanding Using Smart Multimodal Digital Signage , 2014, IEEE Transactions on Human-Machine Systems.

[12]  David McNeill,et al.  Gesture, Gaze, and Ground , 2005, MLMI.

[13]  Stefan Winkler,et al.  Jointly Estimating Interactions and Head, Body Pose of Interactors from Distant Social Scenes , 2015, ACM Multimedia.

[14]  Anton Nijholt,et al.  Meeting behavior detection in smart environments: Nonverbal cues that help to obtain natural interaction , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[15]  Tatsuya Kawahara,et al.  Speaker diarization using eye-gaze information in multi-party conversations , 2014, INTERSPEECH.

[16]  Nicu Sebe,et al.  Analyzing Free-standing Conversational Groups: A Multimodal Approach , 2015, ACM Multimedia.