Learning Facial Representations from the Cycle-consistency of Face

Faces manifest large variations in many aspects, such as identity, expression, pose, and face styling. Therefore, it is a great challenge to disentangle and extract these characteristics from facial images, especially in an unsupervised manner. In this work, we introduce cycle-consistency in facial characteristics as free supervisory signal to learn facial representations from unlabeled facial images. The learning is realized by superimposing the facial motion cycleconsistency and identity cycle-consistency constraints. The main idea of the facial motion cycle-consistency is that, given a face with expression, we can perform de-expression to a neutral face via the removal of facial motion and further perform re-expression to reconstruct back to the original face. The main idea of the identity cycle-consistency is to exploit both de-identity into mean face by depriving the given neutral face of its identity via feature re-normalization and re-identity into neutral face by adding the personal attributes to the mean face. At training time, our model learns to disentangle two distinct facial representations to be useful for performing cycle-consistent face reconstruction. At test time, we use the linear protocol scheme for evaluating facial representations on various tasks, including facial expression recognition and head pose regression. We also can directly apply the learnt facial representations to person recognition, frontalization and image-to-image translation. Our experiments show that the results of our approach is competitive with those of existing methods, demonstrating the rich and unique information embedded in the disentangled representations. Code is available at https: //github.com/JiaRenChang/FaceCycle.

[1]  Stefan Roth,et al.  MirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Honghai Liu,et al.  Feature Selection Mechanism in CNNs for Facial Expression Recognition , 2018, BMVC.

[3]  Lijun Yin,et al.  Facial Expression Recognition by De-expression Residue Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Gaurav Sharma,et al.  Unsupervised Learning of Face Representations , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[5]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[8]  Iasonas Kokkinos,et al.  Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance , 2018, ECCV.

[9]  M. Hasselmo,et al.  The role of expression and identity in the face-selective responses of neurons in the temporal visual cortex of the monkey , 1989, Behavioural Brain Research.

[10]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Corneliu Florea,et al.  Annealed Label Transfer for Face Expression Recognition , 2019, BMVC.

[14]  Shiguang Shan,et al.  Self-Supervised Representation Learning From Videos for Facial Action Unit Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Liming Chen,et al.  3D-Aided Face Recognition Robust to Expression and Pose Variations , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Tal Hassner,et al.  Effective face frontalization in unconstrained images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Juyong Zhang,et al.  Disentangled Representation Learning for 3D Face Shape , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yung-Yu Chuang,et al.  FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  R. Dolan,et al.  fMRI-adaptation reveals dissociable neural representations of identity and expression in face perception. , 2004, Journal of neurophysiology.

[25]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[26]  Joon Son Chung,et al.  VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.

[27]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Victor Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[30]  Weihong Deng,et al.  Cross-Pose LFW : A Database for Studying Cross-Pose Face Recognition in Unconstrained Environments , 2018 .

[31]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[32]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Changsheng Xu,et al.  Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[37]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Andrew Zisserman,et al.  X2Face: A network for controlling face generation by using images, audio, and pose codes , 2018, ECCV.

[39]  Mohammad Soleymani,et al.  Self-Supervised Learning for Facial Action Unit Recognition through Temporal Consistency , 2020, BMVC.

[40]  Xianglei Xing,et al.  Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Joon Son Chung,et al.  VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[42]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[43]  J. Haxby,et al.  The distributed human neural system for face perception , 2000, Trends in Cognitive Sciences.

[44]  Yuting Zhang,et al.  Unsupervised Discovery of Object Landmarks as Structural Representations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[46]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[47]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Chi-Keung Tang,et al.  Attribute-Guided Face Generation Using Conditional CycleGAN , 2017, ECCV.

[49]  Wei Shen,et al.  Learning Residual Images for Face Attribute Manipulation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Andrew Zisserman,et al.  Self-supervised learning of a facial attribute embedding from video , 2018, BMVC.

[51]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[52]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).