Every Smile is Unique: Landmark-Guided Diverse Smile Generation

Each smile is unique: one person surely smiles in different ways (e.g. closing/opening the eyes or mouth). Given one input image of a neutral face, can we generate multiple smile videos with distinctive characteristics? To tackle this one-to-many video generation problem, we propose a novel deep learning architecture named Conditional Multi-Mode Network (CMM-Net). To better encode the dynamics of facial expressions, CMM-Net explicitly exploits facial landmarks for generating smile sequences. Specifically, a variational auto-encoder is used to learn a facial landmark embedding. This single embedding is then exploited by a conditional recurrent network which generates a landmark embedding sequence conditioned on a specific expression (e.g. spontaneous smile). Next, the generated landmark embeddings are fed into a multi-mode recurrent landmark generator, producing a set of landmark sequences still associated to the given smile class but clearly distinct from each other. Finally, these landmark sequences are translated into face videos. Our experimental results demonstrate the effectiveness of our CMM-Net in generating realistic videos of multiple smile expressions.

[1]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[2]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Nicu Sebe,et al.  Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction , 2017, NIPS.

[4]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[5]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[7]  Philip H. S. Torr,et al.  Multi-agent Diverse Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[9]  Nicu Sebe,et al.  Recurrent Convolutional Face Alignment , 2016, ACCV.

[10]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[11]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[12]  Yongzhao Zhan,et al.  Multi-pose Facial Expression Recognition Using Transformed Dirichlet Process , 2016, ACM Multimedia.

[13]  Albert Ali Salah,et al.  Are You Really Smiling at Me? Spontaneous versus Posed Enjoyment Smiles , 2012, ECCV.

[14]  Mohammad H. Mahoor,et al.  Extended DISFA Dataset: Investigating Posed and Spontaneous Facial Expressions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[16]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[17]  Masaki Saito,et al.  Temporal Generative Adversarial Nets , 2016, ArXiv.

[18]  Tamara L. Berg,et al.  Learning Temporal Transformations from Time-Lapse Videos , 2016, ECCV.

[19]  Nicu Sebe,et al.  Recurrent Face Aging with Hierarchical AutoRegressive Memory , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[22]  David Zhang,et al.  Deep Identity-aware Transfer of Facial Attributes , 2016, ArXiv.

[23]  Shunta Saito,et al.  Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Yong Man Ro,et al.  Subtle Facial Expression Recognition Using Adaptive Magnification of Discriminative Facial Motion , 2015, ACM Multimedia.

[25]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[26]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[28]  Nicu Sebe,et al.  Recurrent Convolutional Shape Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[30]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  P. King-Smith,et al.  Luminance and opponent-color contributions to visual detection and adaptation and to temporal and spatial integration. , 1976, Journal of the Optical Society of America.

[32]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[33]  Vishal M. Patel,et al.  GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[34]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[35]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[36]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[37]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[38]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[39]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[40]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[41]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[42]  Xiaoou Tang,et al.  Automatic facial expression recognition on a single 3D face by exploring shape deformation , 2009, ACM Multimedia.

[43]  LinLin Shen,et al.  Deep Feature Consistent Variational Autoencoder , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[44]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Nicu Sebe,et al.  Learning Personalized Models for Facial Expression Analysis and Gesture Recognition , 2016, IEEE Transactions on Multimedia.

[47]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Ruben Villegas,et al.  Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.

[49]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[50]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[51]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[52]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[53]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[54]  Nicu Sebe,et al.  Recurrent Face Aging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Nicu Sebe,et al.  Viraliency: Pooling Local Virality , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.