Local Learning to Improve Bag of Visual Words Model for Facial Expression Recognition

In this paper we propose a novel computer vision method for classifying human facial expression from low resolution images. Our method uses the bag of words representation. It extracts dense SIFT descriptors either from the whole image or from a spatial pyramid that divides the image into increasingly fine sub-regions. Then, it represents images as normalized (spatial) presence vectors of visual words from a codebook obtained through clustering image descriptors. Linear kernels are built for several choices of spatial presence vectors, and combined into weighted sums for multiple kernel learning (MKL). For machine learning, the method makes use of multi-class one-versus-all SVM on the MKL kernel computed using this representation, but with an important twist, the learning is local, as opposed to global – in the sense that, for each face with an unknown label, a set of neighbors is selected to build a local classification model, which is eventually used to classify only that particular face. Empirical results indicate that the use of presence vectors, local learning and spatial information improve recognition performance together by more than 5%. Finally, the proposed model ranked fourth in the Facial Expression Recognition Challenge, with an accuracy of 67.484% on the final test set. ICML 2013 Workshop on Representation Learning, Atlanta, Georgia, USA, 2013. Copyright 2013 by the author(s).

[1]  C. Darwin The Expression of the Emotions in Man and Animals , .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[6]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[7]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[9]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[16]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.