Joint multi-feature fusion and attribute relationships for facial attribute prediction

Predicting facial attributes from wild images is very challenging due to complex face variations. The key to this problem is to construct rich facial representations and take advantage of attribute relationships. In this paper, we propose a novel multi-task convolutional neural network (MTCNN) and a supervision signal called Online Batch Relation Loss (OBRL) for face attribute prediction in the wild. In particular, MTCNN builds informative facial features by embedding identity, age and race features from IdentityNet, AgeNet and RaceNet respectively. In addition, OBRL can diminish distribution shift of attribute relationships by mining attribute correlation within each minibatch, while it penalizes the probability divergence between a pair of attributes. In order to learn discriminative attribute features, we feed AttributeNet with fused facial features and partition attributes into nine groups to share intra-group features and reduce redundant computation. Finally, AttributeNet is optimized with the joint supervision of Cross Entropy Loss and OBRL. Experiments on CelebA and LFWA show that the proposed method outperforms the state-of-the-art methods with a significant margin.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Xiaoyang Tan,et al.  Fusing Gabor and LBP Feature Sets for Kernel-Based Face Recognition , 2007, AMFG.

[4]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[9]  Shree K. Nayar,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Describable Visual Attributes for Face Verification and Image Search , 2022 .

[10]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[11]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[13]  Honggang Zhang,et al.  Joint patch and multi-label learning for facial action unit detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yang Zhong,et al.  Face attribute prediction using off-the-shelf CNN features , 2016, 2016 International Conference on Biometrics (ICB).

[15]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[19]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[20]  Yu Qiao,et al.  Deep face attributes recognition using spatial transformer network , 2016, 2016 IEEE International Conference on Information and Automation (ICIA).

[21]  Jing Wang,et al.  Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).