Faithful Face Image Completion for HMD Occlusion Removal

Head-mounted-displays (HMDs) provide immersive experiences of virtual content. While being flexible, HMDs could be a hindrance for Virtual Reality (VR) applications such as VR teleconference where facial components and expressions of the user are partially occluded thus cannot be seen by others. We present an automatic face image completion solution that treats the occluded region as a hole and completes the hole with the help of an occlusion-free reference image of the same person. Given the occluded input image and an occlusion-free reference image, our method first computes head pose features from estimated facial landmarks. The head pose features, as well as images, are then fed into a generative adversarial network (GAN) to synthesize the output image. Our method can generate faithful results from various input cases and outperforms other face completion methods. It provides a light-weighted solution to HMD occlusion removal and has the potential to benefit VR applications.

[1]  Masayuki Takemura,et al.  Diminishing head-mounted display for shared mixed reality , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[2]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[4]  Lucas Theis,et al.  Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Justus Thies,et al.  Headon , 2018, ACM Trans. Graph..

[6]  Ran He,et al.  Geometry-Aware Face Completion and Editing , 2018, AAAI.

[7]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[8]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[10]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[11]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[12]  Runze Liang,et al.  ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks , 2019, Computational Visual Media.

[13]  Cristian Canton-Ferrer,et al.  Eye In-painting with Exemplar Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Ruigang Yang,et al.  Identity Preserving Face Completion for Large Ocular Region Occlusion , 2018, BMVC.

[15]  Matthias Zwicker,et al.  Faceshop , 2018, ACM Trans. Graph..

[16]  Shigeo Morishima,et al.  FSNet: An Identity-Aware Generative Model for Image-based Face Swapping , 2018, ACCV.

[17]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[19]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[21]  Lin Gao,et al.  Real-Time 3D Face Reconstruction and Gaze Tracking for Virtual Reality , 2018, 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[22]  Justus Thies,et al.  FaceVR , 2018, ACM Trans. Graph..

[23]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[24]  Shi-Min Hu,et al.  Example-Guided Style-Consistent Image Synthesis From Semantic Labeling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).