Disentangling Factors of Variation for Facial Expression Recognition

We propose a semi-supervised approach to solve the task of emotion recognition in 2D face images using recent ideas in deep learning for handling the factors of variation present in data. An emotion classification algorithm should be both robust to (1) remaining variations due to the pose of the face in the image after centering and alignment, (2) the identity or morphology of the face. In order to achieve this invariance, we propose to learn a hierarchy of features in which we gradually filter the factors of variation arising from both (1) and (2). We address (1) by using a multi-scale contractive convolutional network (CCNET) in order to obtain invariance to translations of the facial traits in the image. Using the feature representation produced by the CCNET, we train a Contractive Discriminative Analysis (CDA) feature extractor, a novel variant of the Contractive Auto-Encoder (CAE), designed to learn a representation separating out the emotion-related factors from the others (which mostly capture the subject identity, and what is left of pose after the CCNET). This system beats the state-of-the-art on a recently proposed dataset for facial expression recognition, the Toronto Face Database, moving the state-of-art accuracy from 82.4% to 85.0%, while the CCNET and CDA improve accuracy of a standard CAE by 8%.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  M. Bartlett Further aspects of the theory of multiple regression , 1938, Mathematical Proceedings of the Cambridge Philosophical Society.

[4]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[5]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6]  John C. Platt,et al.  Postal Address Block Location Using a Convolutional Locator Network , 1993, NIPS.

[7]  G. Cottrell,et al.  A Simple Neural Network Models Categorical Perception of Facial Expressions , 1998 .

[8]  G. Cottrell,et al.  EMPATH: A Neural Network that Categorizes Facial Expressions , 2002, Journal of Cognitive Neuroscience.

[9]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[10]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[13]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[15]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[16]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[18]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[19]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[20]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[23]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[24]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.