Structured Output SVM Prediction of Apparent Age, Gender and Smile from Deep Features

We propose structured output SVM for predicting the apparent age as well as gender and smile from a single face image represented by deep features. We pose the problem of apparent age estimation as an instance of the multi-class structured output SVM classifier followed by a softmax expected value refinement. The gender and smile predictions are treated as binary classification problems. The proposed solution first detects the face in the image and then extracts deep features from the cropped image around the detected face. We use a convolutional neural network with VGG-16 architecture [25] for learning deep features. The network is pretrained on the ImageNet [24] database and then fine-tuned on IMDB-WIKI [21] and ChaLearn 2015 LAP datasets [8]. We validate our methods on the ChaLearn 2016 LAP dataset [9]. Our structured output SVMs are trained solely on ChaLearn 2016 LAP data. We achieve excellent results for both apparent age prediction and gender and smile classification.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[6]  Xiu-Shen Wei,et al.  Deep Spatial Pyramid: The Devil is Once Again in the Details , 2015, ArXiv.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Anil K. Jain,et al.  Age estimation from face images: Human vs. machine performance , 2013, 2013 International Conference on Biometrics (ICB).

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Václav Hlavác,et al.  Real-time multi-view facial landmark detector learned by the structured output SVM , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[11]  Sören Sonnenburg,et al.  Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization , 2009, J. Mach. Learn. Res..

[12]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[13]  Xin Liu,et al.  AgeNet: Deeply Learned Regressor and Classifier for Robust Apparent Age Estimation , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[14]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[15]  Jiri Matas,et al.  WaldBoost - learning for time constrained sequential detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Tal Hassner,et al.  Age and Gender Estimation of Unfiltered Faces , 2014, IEEE Transactions on Information Forensics and Security.

[17]  Yan Li,et al.  A Study on Apparent Age Estimation , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Sergio Escalera,et al.  ChaLearn Looking at People and Faces of the World: Face AnalysisWorkshop and Challenge 2016 , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[22]  Robert Laganière,et al.  Real-time embedded age and gender classification in unconstrained video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Shiguang Shan,et al.  Two Birds, One Stone: Jointly Learning Binary Code for Large-Scale Face Image Retrieval and Attributes Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[25]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[26]  Luc Van Gool,et al.  Some Like It Hot — Visual Guidance for Preference Prediction , 2016, CVPR 2016.

[27]  Václav Hlavác,et al.  V-shaped interval insensitive loss for ordinal classification , 2015, Machine Learning.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Sergio Escalera,et al.  ChaLearn Looking at People 2015: Apparent Age and Cultural Event Recognition Datasets and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[30]  Luc Van Gool,et al.  DEX: Deep EXpectation of Apparent Age from a Single Image , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).