Head Pose Tracking and Focus of Attention Recognition Algorithms in Meeting Rooms

The paper presents an evaluation of both head pose and visual focus of attention (VFOA) estimation algorithms in a meeting room environment. Head orientation is estimated using a Rao-Blackwellized mixed state particle filter to achieve joint head localization and pose estimation. The output of this tracker is exploited in an Hidden Markov Model (HMM) to estimate people's VFOA. Contrarily to previous studies on the topic, in our set-up, the potential VFOA of people is not restricted to other meeting participants only, but includes environmental targets (table, slide screen), which renders the task more difficult due to more ambiguity between VFOA target directions. By relying on a corpus of 8 meetings of 8 minutes on average featuring 4 persons involved in the discussion of statements projected on a slide screen, and for which head orientation ground truth was obtained using magnetic sensor devices, we thoroughly assess the performance of the above algorithms, demonstrating the validity of our approaches and pointing out to further research directions.

[1]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing based on multiple cues , 2002, IEEE Trans. Neural Networks.

[2]  Jie Zhu,et al.  Head orientation and gaze direction in meetings , 2002, CHI Extended Abstracts.

[3]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  K. Parker,et al.  Speaking turns in small group interaction: A context-sensitive event sequence model. , 1988 .

[5]  Helge J. Ritter,et al.  Recognition of human head orientation based on artificial neural networks , 1998, IEEE Trans. Neural Networks.

[6]  J. Odobez,et al.  A Rao-Blackwellized Mixed State Particle Filter for Head Pose Tracking , 2005 .

[7]  A. Doucet,et al.  On sequential sampling Monte Carlo sampling methods for Bayesian filtering , 2000 .

[8]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[9]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[10]  Qiang Ji,et al.  Multi-View Face Tracking with Factorial and Switching HMM , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[11]  Alexander H. Waibel,et al.  Skin-Color Modeling and Adaptation , 1998, ACCV.

[12]  Ying Wu,et al.  Wide-range, person- and illumination-insensitive head orientation estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[13]  Lisa M. Brown,et al.  Comparative study of coarse head pose estimation , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[14]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Hong Chen,et al.  Model- and Exemplar-based Robust Head Pose Tracking Under Occlusion and Varying Expression , 2001, CVPR 2001.

[16]  E.,et al.  GROUPS : INTERACTION AND PERFORMANCE , 2001 .

[17]  Timothy F. Cootes,et al.  Comparing Variations on the Active Appearance Model Algorithm , 2002, BMVC.

[18]  Liang Zhao,et al.  Real-time head orientation estimation using neural networks , 2002, Proceedings. International Conference on Image Processing.