Automatic interface of cross-modal nonverbal interactions in multiparty conversation

A novel probabilistic framework is proposed for analyzing cross-modal nonverbal interactions in multiparty face-to-face conversations. The goal is to determine “who responds to whom, when, and how” from multimodal cues including gaze, head gestures, and utterances. We formulate this problem as the probabilistic inference of the causal relationship among participants’ behaviors involving head gestures and utterances. To solve this problem, this paper proposes a hierarchical probabilistic model; the structures of interactions are probabilistically determined from high-level conversation regimes (such as monologue or dialogue) and gaze directions. Based on the model, the interaction structures, gaze, and conversation regimes, are simultaneously inferred from observed head motion and utterances, using a Markov chain Monte Carlo method. The head gestures, including nodding, shaking and tilt, are recognized with a novel Wavelet-based technique from magnetic sensor signals. The utterances are detected using data captured by lapel microphones. Experiments on four-person conversations confirm the effectiveness of the framework in discovering interactions such as question-and-answer and addressing behavior followed by back-channel responses.

[1]  J. Mccroskey,et al.  Nonverbal Behavior in Interpersonal Relations , 1987 .

[2]  Steve Renals,et al.  Dynamic Bayesian networks for meeting structuring , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  E. Schegloff,et al.  Opening up Closings , 1973 .

[4]  Sumit Basu,et al.  Conversational scene analysis , 2002 .

[5]  Trevor Darrell,et al.  Contextual recognition of head gestures , 2005, ICMI '05.

[6]  Daniel Gatica-Perez,et al.  Analyzing Group Interactions in Conversations: a Review , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[7]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[9]  M. Argyle Bodily communication, 2nd ed. , 1988 .

[10]  C. Goodwin Conversational Organization: Interaction Between Speakers and Hearers , 1981 .

[11]  Ashish Kapoor,et al.  A real-time head nod and shake detector , 2001, PUI '01.

[12]  Hiroshi Murase,et al.  Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[13]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[14]  Elif Derya Übeyli,et al.  Multiclass Support Vector Machines for EEG-Signals Classification , 2007, IEEE Transactions on Information Technology in Biomedicine.

[15]  S. Maynard Interactional functions of a nonverbal sign Head movement in japanese dyadic casual conversation , 1987 .

[16]  W. S. Condon,et al.  SOUND FILM ANALYSIS OF NORMAL AND PATHOLOGICAL BEHAVIOR PATTERNS , 1966, The Journal of nervous and mental disease.

[17]  Mary P. Harper,et al.  VACE Multimodal Meeting Corpus , 2005, MLMI.

[18]  Yoichi Sato,et al.  Pose-Invariant Facial Expression Recognition Using Variable-Intensity Templates , 2007, ACCV.

[19]  Mary P. Harper,et al.  A Multimodal Analysis of Floor Control in Meetings , 2006, MLMI.

[20]  Alex Pentland,et al.  Sensing and modeling human networks , 2004 .

[21]  Jacques Janssen,et al.  Applied Semi-Markov Processes , 2005 .

[22]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.