Visual Focus of Attention in Dynamic Meeting Scenarios

This paper presents our data collection and first evaluations on visual focus of attention during dynamic meeting scenes. We included moving focus targets and unforeseen interruptions in each meeting, by guiding each meeting along a predefined script of events that three participating actors were instructed to follow. Further meeting attendees were not introduced to upcoming actions or the general purpose of the meeting, hence we were able to capture their natural focus changes within this predefined dynamic scenario with an extensive setup of both visual and acoustical sensors throughout our smart room. We present an adaptive approach to estimate visual focus of attention based on head orientation under these unforeseen conditions and show, that our system achieves an overall recognition rate of 59%, compared to 9% less when choosing the best matching focus target directly from the observed head orientation angles.

[1]  Rainer Stiefelhagen,et al.  Audio-visual multi-person tracking and identification for smart environments , 2007, ACM Multimedia.

[2]  D. Sparks,et al.  Eye-head coordination during head-unrestrained gaze shifts in rhesus monkeys. , 1997, Journal of neurophysiology.

[3]  Roberto Brunelli,et al.  Joint Bayesian Tracking of Head Location and Pose from Low-Resolution Video , 2007, CLEAR.

[4]  Montse Pardàs,et al.  Head Orientation Estimation Using Particle Filtering in Multiview Scenarios , 2007, CLEAR.

[5]  Jean-Marc Odobez,et al.  A Cognitive and Unsupervised Map Adaptation Approach to the Recognition of the Focus of Attention from Head Pose , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[6]  Rainer Stiefelhagen,et al.  Head Pose Estimation in Single- and Multi-view Environments - Results on the CLEAR'07 Benchmarks , 2007, CLEAR.

[7]  Jesse S. Jin,et al.  A Quantitative Analysis for Decomposing Visual Signal of the Gaze Displacement , 2001, VIP.

[8]  Jean-Marc Odobez,et al.  Multi-party focus of attention recognition in meetings from head pose and multimodal contextual cues , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Rainer Stiefelhagen,et al.  Face Recognition in Smart Rooms , 2007, MLMI.

[10]  Rainer Stiefelhagen,et al.  Tracking focus of attention in meetings , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[11]  Petr Motlícek,et al.  Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Roberto Brunelli,et al.  An Appearance-Based Particle Filter for Visual Tracking in Smart Rooms , 2007, CLEAR.

[13]  Rainer Stiefelhagen,et al.  Tracking head pose and focus of attention with multiple far-field cameras , 2006, ICMI '06.

[14]  Mary P. Harper,et al.  A Multimodal Analysis of Floor Control in Meetings , 2006, MLMI.

[15]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[16]  Daniel Gatica-Perez,et al.  Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech , 2006, ICMI '06.