Reconstruction-Based Disentanglement for Pose-Invariant Face Recognition

Deep neural networks (DNNs) trained on large-scale datasets have recently achieved impressive improvements in face recognition. But a persistent challenge remains to develop methods capable of handling large pose variations that are relatively under-represented in training data. This paper presents a method for learning a feature representation that is invariant to pose, without requiring extensive pose coverage in training data. We first propose to generate non-frontal views from a single frontal face, in order to increase the diversity of training data while preserving accurate facial details that are critical for identity discrimination. Our next contribution is to seek a rich embedding that encodes identity features, as well as non-identity ones such as pose and landmark locations. Finally, we propose a new feature reconstruction metric learning to explicitly disentangle identity and pose, by demanding alignment between the feature reconstructions through various combinations of identity and pose features, which is obtained from two images of the same subject. Experiments on both controlled and in-the-wild face datasets, such as MultiPIE, 300WLP and the profile view database CFP, show that our method consistently outperforms the state-of-the-art, especially on images with large head pose variations.

[1]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2]  Tien D. Bui,et al.  Beyond Principal Components: Deep Boltzmann Machines for face modeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[4]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Feng Zhou,et al.  Deep Deformation Network for Object Landmark Localization , 2016, ECCV.

[7]  Shiguang Shan,et al.  Stacked Progressive Auto-Encoders (SPAE) for Face Recognition Across Poses , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[9]  Xiaoming Liu,et al.  Coefficients Pose-Variant Input Recogni 8 on Engine Frontalized Output Generator FF-GAN D Discriminator Extreme Pose Input Frontalized Output , 2017 .

[10]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[12]  S. Edelman,et al.  Orientation dependence in the recognition of familiar and novel views of three-dimensional objects , 1992, Vision Research.

[13]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[14]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Rama Chellappa,et al.  Fisher vector encoded deep convolutional features for unconstrained face verification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[17]  Ira Kemelmacher-Shlizerman,et al.  Head Reconstruction from Internet Photos , 2016, ECCV.

[18]  Gang Hua,et al.  Hierarchical-PEP model for real-world face recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yiying Tong,et al.  Unconstrained 3D face reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[21]  Xiangyu Zhu,et al.  High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Allan Aasbjerg Nielsen,et al.  Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data , 2002, IEEE Trans. Image Process..

[23]  Carlos D. Castillo,et al.  Frontal to profile face verification in the wild , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[25]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[26]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiaogang Wang,et al.  Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations , 2014, NIPS.

[29]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[30]  Shiguang Shan,et al.  Multi-view Deep Network for Cross-View Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[33]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[36]  Junzhou Huang,et al.  Face Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Model , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[38]  Yang Yu,et al.  Toward Personalized Modeling: Incremental and Ensemble Alignment for Sequential Faces in the Wild , 2018, International Journal of Computer Vision.

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40]  Dimitris N. Metaxas,et al.  Consensus of Regression for Occlusion-Robust Facial Feature Localization , 2014, ECCV.

[41]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[42]  Tal Hassner,et al.  Effective face frontalization in unconstrained images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Gérard G. Medioni,et al.  Pose-Aware Face Recognition in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[46]  Carlos D. Castillo,et al.  Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[47]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[48]  Ahmed M. Elgammal,et al.  From circle to 3-sphere: Head pose estimation by instance parameterization , 2015, Comput. Vis. Image Underst..

[49]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[50]  Xiaolong Wang,et al.  Leveraging multiple cues for recognizing family photos , 2017, Image Vis. Comput..

[51]  Xiaogang Wang,et al.  Deep Learning Identity-Preserving Face Space , 2013, 2013 IEEE International Conference on Computer Vision.