Recognition of (3) party conversation using prosody and gaze

We have developed a recognition system that can understand the multi-party conversation from combined information of prosody and gaze. In multi-party conversation, the conversation becomes complex because many overlaps and interrupts are generated by side participants. And thus becomes difficult to keep track the main thread of the conversation. Gaze works as a strong clue to both clarify and perceive “whose talking to whom” and “whose listening to whom”, and can be used to improve the understanding of the conversational situation. We have analyzed the gaze behavior in conversational situations based on actual human-to-human conversation recoding, and created a computational model to recognize the main thread of the conversation. The performance has improved up to 20 point compared to the condition that only used prosody.

[1]  E. Goffman,et al.  Forms of talk , 1982 .

[2]  Samy Bengio,et al.  Modeling human interaction in meetings , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Processing of Multi-Party Meetings? Evidence from Predicting Punctuation, Disfluencies, and Overlapping Speech , 2003 .

[4]  K NayarShree,et al.  Visual learning and recognition of 3-D objects from appearance , 1995 .

[5]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  A. Kendon Conducting Interaction: Patterns of Behavior in Focused Encounters , 1990 .

[7]  Tetsunori Kobayashi,et al.  Conversation Robot Participating in Group Conversation , 2003 .