Neural Video Portrait Relighting in Real-time via Consistency Modeling

Video portraits relighting is critical in user-facing human photography, especially for immersive VR/AR experience. Recent advances still fail to recover consistent relit result under dynamic illuminations from monocular RGB stream, suffering from the lack of video consistency supervision. In this paper, we propose a neural approach for real-time, high-quality and coherent video portrait relighting, which jointly models the semantic, temporal and lighting consistency using a new dynamic OLAT dataset. We propose a hybrid structure and lighting disentanglement in an encoder-decoder architecture, which combines a multi-task and adversarial training strategy for semantic-aware consistency modeling. We adopt a temporal modeling scheme via flow-based supervision to encode the conjugated temporal consistency in a cross manner. We also propose a lighting sampling strategy to model the illumination consistency and mutation for natural portrait light manipulation in real-world. Extensive experiments demonstrate the effectiveness of our approach for consistent video portrait light-editing and relighting, even using mobile computing.

[1]  Adrian Hilton,et al.  A FACS valid 3D dynamic action unit database with applications to 3D dynamic morphable facial modeling , 2011, 2011 International Conference on Computer Vision.

[2]  Yun-Ta Tsai,et al.  Neural Light Transport for Relighting and View Synthesis , 2021, ACM Transactions on Graphics.

[3]  Eric Sommerlade,et al.  Relighting Images in the Wild with a Self-Supervised Siamese Auto-Encoder , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4]  Yun-Ta Tsai,et al.  Light stage super-resolution , 2020, ACM Trans. Graph..

[5]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[6]  Gordon Wetzstein,et al.  State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[7]  Sylvain Paris,et al.  Blind video temporal consistency , 2015, ACM Trans. Graph..

[8]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stefanos Zafeiriou,et al.  4DFAB: A Large Scale 4D Database for Facial Expression Analysis and Biometric Applications , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Oliver Wang,et al.  Video Relighting Using Infrared Illumination , 2008, Comput. Graph. Forum.

[11]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[12]  Oswald Aldrian,et al.  Inverse Rendering of Faces with a 3D Morphable Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Andrew Gardner,et al.  Performance relighting and reflectance transformation with time-multiplexed illumination , 2005, ACM Trans. Graph..

[14]  Sylvain Paris,et al.  Portrait lighting transfer using a mass transport approach , 2017, TOGS.

[15]  Kalyan Sunkavalli,et al.  Deep image-based relighting from optimal sparse samples , 2018, ACM Trans. Graph..

[16]  Kalyan Sunkavalli,et al.  Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[17]  Evgeny Burnaev,et al.  Relightable 3D Head Portraits from a Smartphone Video , 2020, ArXiv.

[18]  Jingyi Yu,et al.  Relightable Neural Video Portrait , 2021, ArXiv.

[19]  Quan Wang,et al.  Single image portrait relighting via explicit multiple reflectance channel modeling , 2020, ACM Trans. Graph..

[20]  Gang Hua,et al.  Face Re-Lighting from a Single Image under Harsh Lighting Conditions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[22]  Shaun J. Canavan,et al.  BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database , 2014, Image Vis. Comput..

[23]  Yun-Ta Tsai,et al.  Single image portrait relighting , 2019, ACM Trans. Graph..

[24]  Yannick Hold-Geoffroy,et al.  Deep Sky Modeling for Single Image Outdoor Lighting Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[26]  Kalyan Sunkavalli,et al.  Learning to reconstruct shape and spatially-varying reflectance from a single image , 2018, ACM Trans. Graph..

[27]  Ronen Basri,et al.  Lambertian reflectance and linear subspaces , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[28]  Sylvain Paris,et al.  Example-based video color grading , 2013, ACM Trans. Graph..

[29]  Qinping Zhao,et al.  Face illumination transfer through edge-preserving filters , 2011, CVPR 2011.

[30]  Qionghai Dai,et al.  Intrinsic video and applications , 2014, ACM Trans. Graph..

[31]  Frédo Durand,et al.  Style transfer for headshot portraits , 2014, ACM Trans. Graph..

[32]  Peter V. Gehler,et al.  Intrinsic Video , 2014, ECCV.

[33]  Bernhard Egger,et al.  Occlusion-Aware 3D Morphable Models and an Illumination Prior for Face Image Analysis , 2018, International Journal of Computer Vision.

[34]  Paul E. Debevec,et al.  Effect of illumination on automatic expression recognition: A novel 3D relightable facial database , 2011, Face and Gesture 2011.

[35]  Andrew Gardner,et al.  Performance relighting and reflectance transformation with time-multiplexed illumination , 2005, SIGGRAPH 2005.

[36]  Christian Theobalt,et al.  PIE , 2020, ACM Trans. Graph..

[37]  Sang Chul Ahn,et al.  Tangible video teleconference system using real-time image-based relighting , 2009, IEEE Transactions on Consumer Electronics.

[38]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jingyi Yu,et al.  Editable free-viewpoint video using a layered neural representation , 2021, ACM Trans. Graph..

[40]  Peter Litwinowicz,et al.  Processing images and video for an impressionist effect , 1997, SIGGRAPH.

[41]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[42]  Ming-Hsuan Yang,et al.  Stylizing face images via multiple exemplars , 2017, Comput. Vis. Image Underst..

[43]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[44]  P. Hanrahan,et al.  On the relationship between radiance and irradiance: determining the illumination from images of a convex Lambertian object. , 2001, Journal of the Optical Society of America. A, Optics, image science, and vision.

[45]  Tae-Hyun Oh,et al.  Monocular Reconstruction of Neural Face Reflectance Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  J KriegmanDavid,et al.  Acquiring Linear Subspaces for Face Recognition under Variable Lighting , 2005 .

[47]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[48]  Jean-François Lalonde,et al.  Learning Physics-Guided Face Relighting Under Directional Light , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  David W. Jacobs,et al.  Deep Single-Image Portrait Relighting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Ersin Yumer,et al.  Learning to predict indoor illumination from a single image , 2017, ACM Trans. Graph..

[51]  Qionghai Dai,et al.  Free-viewpoint video relighting from multi-view sequence under general illumination , 2014, Machine Vision and Applications.

[52]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Hans-Peter Seidel,et al.  Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos , 2012, Comput. Graph. Forum.

[54]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[55]  Ramin Samadani,et al.  Invisible light: Using infrared for video conference relighting , 2010, 2010 IEEE International Conference on Image Processing.

[56]  Jiandong Tian,et al.  Depth and Image Restoration from Light Field in a Scattering Medium , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Thomas Brox,et al.  Artistic Style Transfer for Videos , 2016, GCPR.

[59]  Wen Gao,et al.  The CAS-PEAL Large-Scale Chinese Face Database and Baseline Evaluations , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[60]  Markus Gross,et al.  Practical temporal consistency for image-based graphics applications , 2012, ACM Trans. Graph..

[61]  Kalyan Sunkavalli,et al.  Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Paul E. Debevec,et al.  Acquiring the reflectance field of a human face , 2000, SIGGRAPH.