Head pose estimation & TV Context: current technology

With the arrival of low-cost high quality cameras, implicit user behaviour tracking is easier and it becomes very interesting for viewer modelling and content personalization in a TV context. In this paper, we present a comparison between three common algorithms of automatic head direction extraction for a person watching TV in a realistic context. Those algorithms compute the different rotation angles of the head (pitch, roll, yaw) in a non-invasive and continuous way based on 2D and/or 3D features acquired with low cost cameras. These results are compared with a reference based on the Qualisys motion capture commercial system which is a robust marker-based tracking system. The performances of the different algorithms are compared function of different configurations. While our results show that full implicit behaviour tracking in real-life TV setups is still a challenge, with the arrival of next generation sensors (as the new Kinect one sensor), accurate TV personalization based on implicit behaviour is close to become a very interesting option.

[1]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[2]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Pedro Arias,et al.  Metrological evaluation of Microsoft Kinect and Asus Xtion sensors , 2013 .

[6]  Matei Mancas,et al.  Head Pose Estimation by Perspective-n-Point Solution Based on 2D Markerless Face Tracking , 2014, INTETAIN.

[7]  S. F. Persa Sensor fusion in head pose tracking for augmented reality , 2006 .

[8]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[9]  Matei Mancas,et al.  Second screen interaction: an approach to infer tv watcher's interest using 3d head pose estimation , 2013, WWW '13 Companion.

[10]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[11]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.