Guiding Visual Surveillance by Tracking Human Attention

We describe a novel method for directing the attention of an automated surveillance system. Our starting premise is that the attention of people in a scene can be used as an indicator of interesting areas and events. To determine people’s attention from passive visual observations we develop a system for automatic tracking and detection of individual heads to infer their gaze direction. The former is achieved by combining a histogram of oriented gradient (HOG) based head detector with frame-to-frame tracking using multiple point features to provide stable head images. The latter is achieved using a head pose classification method which uses randomised ferns with decision branches based on both HOG and colour based features to determine a coarse gaze direction for each person in the scene. By building both static and temporally varying maps of areas where people look we are able to identify interesting regions.

[1]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Neil Robertson,et al.  Behaviour Recognition and Explanation for Video Surveillance , 2006 .

[3]  William T. Freeman,et al.  Example-based head tracking , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[4]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Rainer Stiefelhagen,et al.  A Bayesian Approach for Multi-view Head Pose Estimation , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  Ian D. Reid,et al.  Colour Invariant Head Pose Classification in Low Resolution Video , 2008, BMVC.

[8]  Ian D. Reid,et al.  Active tracking of foveated feature clusters using affine structure , 1996, International Journal of Computer Vision.

[9]  Ying Wu,et al.  Wide-range, person- and illumination-insensitive head orientation estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  Paul A. Beardsley,et al.  A qualitative approach to classifying gaze direction , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[11]  Ramakant Nevatia,et al.  Tracking of Multiple, Partially Occluded Humans based on Static Body Part Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Ian D. Reid,et al.  Real-Time SLAM Relocalisation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  James L. Crowley,et al.  Head Pose Estimation on Low Resolution Images , 2006, CLEAR.

[14]  Ian D. Reid,et al.  Estimating Gaze Direction from Low-Resolution Faces in Video , 2006, ECCV.

[15]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Sharath Pankanti,et al.  Absolute head pose estimation from overhead wide-angle cameras , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[19]  Montse Pardàs,et al.  Head Orientation Estimation Using Particle Filtering in Multiview Scenarios , 2007, CLEAR.

[20]  Sukhan Lee,et al.  Multisensor Fusion and Integration for Intelligent Systems , 2009 .

[21]  Ian Reid,et al.  fastHOG – a real-time GPU implementation of HOG , 2011 .