Head Pose Classification in Crowded Scenes

We propose a novel technique for head pose classification in crowded public space under poor lighting and in low-resolution video images. Unlike previous approaches, we avoid the need for explicit segmentation of skin and hair regions from a head image and implicitly encode spatial information using a grid map for more robustness given lowresolution images. Specifically, a new head pose descriptor is formulated using similarity distance maps by indexing each pixel of a head image to the mean appearance templates of head images at different poses. These distance feature maps are then used to train a multi-class Support Vector Machine for pose classification. Our approach is evaluated against established techniques [3, 13, 14] using the i-LIDS underground scene dataset [9] under challenging lighting and viewing conditions. The results demonstrate that our model gives significant improvement in head pose estimation accuracy, with over 80% pose recognition rate against 32% from the best of existing models.

[1]  Shaogang Gong,et al.  Composite support vector machines for detection of faces across views and pose estimation , 2002, Image Vis. Comput..

[2]  Ian D. Reid,et al.  Colour Invariant Head Pose Classification in Low Resolution Video , 2008, BMVC.

[3]  i-LIDS Team,et al.  Imagery Library for Intelligent Detection Systems (i-LIDS); A Standard for Testing Video Based Detection Systems , 2006, Proceedings 40th Annual 2006 International Carnahan Conference on Security Technology.

[4]  Sharath Pankanti,et al.  Absolute head pose estimation from overhead wide-angle cameras , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[5]  David Beymer,et al.  Face recognition under varying pose , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  T. Poggio,et al.  Direction estimation of pedestrian from multiple still images , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[7]  Rameswar Debnath,et al.  A decision based one-against-one method for multi-class support vector machine , 2004, Pattern Analysis and Applications.

[8]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[9]  Lisa M. Brown,et al.  Comparative study of coarse head pose estimation , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[10]  Rainer Stiefelhagen,et al.  Multi-view head pose estimation using neural networks , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[11]  William T. Freeman,et al.  Example-based head tracking , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[12]  Shaogang Gong,et al.  An investigation into face pose distributions , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[13]  Ian D. Reid,et al.  Estimating Gaze Direction from Low-Resolution Faces in Video , 2006, ECCV.

[14]  John Shawe-Taylor,et al.  Tighter PAC-Bayes Bounds , 2006, NIPS.

[15]  Shaogang Gong,et al.  Face distributions in similarity space under varying head pose , 2001, Image Vis. Comput..

[16]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.