Understanding Interactions and Guiding Visual Surveillance by Tracking Attention

The central tenet of this paper is that by determining where people are looking, other tasks involved with understanding and interrogating a scene are simplified. To this end we describe a fully automatic method to determine a person's attention based on real-time visual tracking of their head and a coarse classification of their head pose. We estimate the head pose, or coarse gaze, using randomised ferns with decision branches based on both histograms of gradient orientations and colour based features. We use the coarse gaze for three applications to demonstrate its value: (i) we show how by building static and temporally varying maps of areas where people look we are able to identify interesting regions; (ii) we show how by determining the gaze of people in the scene we can more effectively control a multi-camera surveillance system to acquire faces for identification; (iii) we show how by identifying where people are looking we can more effectively classify human interactions.

[1]  Andrew Zisserman,et al.  Pose search: Retrieving people using their pose , 2009, CVPR 2009.

[2]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[3]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[4]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[5]  Takahiro Okabe,et al.  Gaze Estimation from Low Resolution Images , 2006, PSIVT.

[6]  Ian D. Reid,et al.  Guiding Visual Surveillance by Tracking Human Attention , 2009, BMVC.

[7]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[10]  Ian D. Reid,et al.  Estimating Gaze Direction from Low-Resolution Faces in Video , 2006, ECCV.

[11]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[12]  Ian D. Reid,et al.  Colour Invariant Head Pose Classification in Low Resolution Video , 2008, BMVC.

[13]  Shaogang Gong,et al.  Head Pose Classification in Crowded Scenes , 2009, BMVC.

[14]  Luis Rueda,et al.  Advances in Image and Video Technology, Second Pacific Rim Symposium, PSIVT 2007, Santiago, Chile, December 17-19, 2007, Proceedings , 2007, PSIVT.

[15]  Eric Sommerlade,et al.  Gaze directed camera control for face image acquisition , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Eric Sommerlade,et al.  Probabilistic surveillance with multiple active cameras , 2010, 2010 IEEE International Conference on Robotics and Automation.

[17]  Ian Reid,et al.  fastHOG – a real-time GPU implementation of HOG , 2011 .