Semi-Supervised Learning for Monocular Gaze Redirection

We present a new approach to monocular learning-based gaze redirection problem in images that is able to train on raw sequences of eye images with unknown gaze directions and a small amount of eye images, where the gaze direction is known. The proposed approach is based on a pair of deep networks, where the first encoder-like network maps eye images to a latent space, while the second network maps pairs of latent representations to warping fields implementing the transformation between the pair of the original images. In the proposed system, both networks are trained in an unsupervised manner, while the gaze-annotated images are only used to estimate displacements in the latent space that are characteristic to certain gaze redirections. Quantitative and qualitative evaluation suggests that such characteristic displacement vectors in the learned latent space can be learned from few examples and are transferable across different people and different imaging conditions.

[1]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[2]  Victor S. Lempitsky,et al.  Learning to look up: Realtime monocular gaze correction using machine learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Victor S. Lempitsky,et al.  DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[4]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Peter Robinson,et al.  GazeDirector: Fully Articulated Eye Gaze Redirection in Video , 2017, Comput. Graph. Forum.

[7]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[10]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[11]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[12]  Ziwei Liu,et al.  Semantic Facial Expression Editing using Autoencoded Flow , 2016, ArXiv.

[13]  Victor Lempitsky,et al.  Photorealistic Monocular Gaze Redirection Using Machine Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.