Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection

This paper presents a probabilistic framework, which incorporates automatic image-based gaze detection, for inferring the structure of multiparty face-to-face conversations. This framework aims to infer conversation regimes and gaze patterns from the nonverbal behaviors of meeting participants, which are captured from image and audio streams with cameras and microphones. The conversation regime corresponds to a global conversational pattern such as monologue and dialogue, and the gaze pattern indicates "who is looking at whom". Input nonverbal behaviors include presence/absence of utterances, head directions, and discrete head-centered eye-gaze directions. In contrast to conventional meeting analysis methods that focus only on the participant's head pose as a surrogate of visual focus of attention, this paper newly incorporates vision-based gaze detection combined with head pose tracking into a probabilistic conversation model based on dynamic Bayesian network. Our gaze detector is able to differentiate 3 to 5 different eye gaze directions, e.g. left, straight and right. Experiments on four-person conversations confirm the power of the proposed framework in identifying conversation structure and in estimating gaze patterns with higher accuracy then previous models.

[1]  M. Argyle Bodily communication, 2nd ed. , 1988 .

[2]  J. Mccroskey,et al.  Nonverbal Behavior in Interpersonal Relations , 1987 .

[3]  C. Goodwin Conversational Organization: Interaction Between Speakers and Hearers , 1981 .

[4]  A. Kendon Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.

[5]  Giulio Sandini,et al.  Control Strategies in the Eye-Head Coordination System , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Chang‐Jin Kim,et al.  State Space Models with Regime Switching , 1999 .

[7]  K. Otsuka,et al.  Automatic interface of cross-modal nonverbal interactions in multiparty conversation , 2007 .

[8]  Laura Chamberlain Eye Tracking Methodology; Theory and Practice , 2007 .

[9]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[10]  Kazuhiro Otsuka,et al.  Real-time Visual Tracker by Stream Processing , 2009, J. Signal Process. Syst..

[11]  Jie Zhu,et al.  Head orientation and gaze direction in meetings , 2002, CHI Extended Abstracts.

[12]  Min-Seok Choi,et al.  A novel two stage template matching method for rotation and illumination invariance , 2002, Pattern Recognit..

[13]  Hiroshi Murase,et al.  Conversation Scene Analysis with Dynamic Bayesian Network Basedon Visual Head Tracking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[14]  Hiroshi Sawada,et al.  Automatic inference of cross-modal nonverbal interactions in multiparty conversations: "who responds to whom, when, and how?" from gaze, head gestures, and utterances , 2007, ICMI '07.

[15]  Jean-Marc Odobez,et al.  Multi-party focus of attention recognition in meetings from head pose and multimodal contextual cues , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Alexander Zelinsky,et al.  Fast Radial Symmetry for Detecting Points of Interest , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[18]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[19]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[20]  L. Stark,et al.  Types of gaze movement: Variable interactions of eye and head movements , 1982, Experimental Neurology.

[21]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[22]  Trevor Darrell,et al.  Recognizing gaze aversion gestures in embodied conversational discourse , 2006, ICMI '06.

[23]  Mohan M. Trivedi,et al.  Head and gaze dynamics in visual attention and context learning , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Daniel Gatica-Perez,et al.  Automatic nonverbal analysis of social interaction in small groups: A review , 2009, Image Vis. Comput..

[25]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[26]  Shumin Zhai,et al.  Conversing with the user based on eye-gaze patterns , 2005, CHI.