A Novel Multi-purpose Deep Architecture for Facial Attribute and Emotion Understanding

Facial expression estimation has for years been studied benefiting a wide array of application areas ranging from information retrieval and sentiment analysis to video surveillance and emotion analysis. Methods have been proposed to tackle the problem of facial attribute recognition using deep architectures yielding high accuracies, however less efforts exist to focus on the performance of these architectures. Here in this work, we make use of Squeeze-Net [6] for the first time in the literature to perform facial emotion recognition benchmarked on Celeb-A and AffectNet datasets. Here we extend Squeeze-Net by introducing a new 5 \(\times \) 5 convolution kernel after the last fully-connected layer offered by Squeeze-Net, merging the 1 \(\times \) 1 and 3 \(\times \) 3 outputs from the last fully-connected layers, to perform a more domain-specific feature extraction. We run extensive experiments using widely-used datasets; i.e. Celeb-A and AffectNet, using AlexNet and Squeeze-Net in addition to our proposed architecture. Our proposed architecture, an extension to Squeeze-Net, yields results inline with state of the art while offering a simple architecture involving less complexity compared to state of the art, reporting accuracies of 90.47% and 56.38% compared to 90.94% and 52.36%, in Attribute Prediction and Expression Prediction respectively.

[1]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[3]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[4]  Gang Wang,et al.  Multi-Task CNN Model for Attribute Prediction , 2015, IEEE Transactions on Multimedia.

[5]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[6]  Rama Chellappa,et al.  Attributes for Improved Attributes: A Multi-Task Network Utilizing Implicit and Explicit Relationships for Facial Attribute Classification , 2017, AAAI.

[7]  Shiguang Shan,et al.  Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Terrance E. Boult,et al.  MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes , 2016, ECCV.

[9]  Yang Zhong,et al.  Leveraging mid-level deep representations for predicting face attributes in the wild , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[10]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.