Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

[1]  D. Ramanan,et al.  Depth-supervised NeRF: Fewer Views and Faster Training for Free , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Chen Change Loy,et al.  Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.

[3]  Bo Dai,et al.  Generative Occupancy Fields for 3D Surface-Aware Image Synthesis , 2021, NeurIPS.

[4]  Haozhe Wu,et al.  Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis , 2021, ACM Multimedia.

[5]  Deva Ramanan,et al.  NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild , 2021, NeurIPS.

[6]  Hujun Bao,et al.  Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Shuai Yi,et al.  CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Ravi Ramamoorthi,et al.  NeLF: Neural Light-transport Field for Portrait View Synthesis and Relighting , 2021, EGSR.

[9]  Changjie Fan,et al.  Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion , 2021, IJCAI.

[10]  Jonathan T. Barron,et al.  HyperNeRF , 2021, ACM Trans. Graph..

[11]  Yaron Lipman,et al.  Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[12]  Taku Komura,et al.  NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, ArXiv.

[13]  Christian Theobalt,et al.  Neural actor , 2021, ACM Trans. Graph..

[14]  Yu Ding,et al.  Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Moustafa Meshry,et al.  Learned Spatial Representations for Few-shot Talking-Head Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Chen Change Loy,et al.  Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xun Cao,et al.  Audio-Driven Emotional Video Portraits , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Stephen Lin,et al.  Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Angela Dai,et al.  NPMs: Neural Parametric Models for 3D Deformable Shapes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Stefan Leutenegger,et al.  In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Pratul P. Srinivasan,et al.  Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  H. Bao,et al.  AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Amit Raj,et al.  Pixel-aligned Volumetric Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Hujun Bao,et al.  Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaolong Wang,et al.  Learning Continuous Image Representation with Local Implicit Image Function , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Justus Thies,et al.  Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Arun Mallya,et al.  One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yaser Sheikh,et al.  Audio- and Gaze-driven Facial Animation of Codec Avatars , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Yun-Ta Tsai,et al.  Neural Light Transport for Relighting and View Synthesis , 2020, ACM Transactions on Graphics.

[34]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hujun Bao,et al.  Animatable Neural Radiance Fields for Human Body Modeling , 2021, ArXiv.

[36]  C. V. Jawahar,et al.  A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild , 2020, ACM Multimedia.

[37]  Dipanjan Das,et al.  Speech-Driven Facial Animation Using Cascaded GANs for Learning of Motion and Texture , 2020, ECCV.

[38]  Kyaw Zaw Lin,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[39]  Chenliang Xu,et al.  Talking-head Generation with Rhythmic Head Motion , 2020, ECCV.

[40]  Yang Zhou,et al.  MakeltTalk , 2020, ACM Trans. Graph..

[41]  Gordon Wetzstein,et al.  Inferring Semantic Information with 3D Neural Scene Representations , 2020, ArXiv.

[42]  Ronen Basri,et al.  Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[43]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[44]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.

[45]  Hujun Bao,et al.  Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose , 2020, 2002.10137.

[46]  Y. Lipman,et al.  Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[47]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Justus Thies,et al.  Neural Voice Puppetry: Audio-driven Facial Reenactment , 2019, ECCV.

[49]  Yinda Zhang,et al.  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Lingyun Wu,et al.  MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yi Li,et al.  Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning , 2018, IJCAI.

[52]  Yu Qiao,et al.  MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation , 2020, ECCV.

[53]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[54]  Maja Pantic,et al.  Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.

[55]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[56]  V. Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Chenliang Xu,et al.  Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Michael J. Black,et al.  Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Hang Zhou,et al.  Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.

[63]  Jingwen Zhu,et al.  Talking Face Generation by Conditional Recurrent Adversarial Network , 2018, IJCAI.

[64]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[65]  Andrew Zisserman,et al.  X2Face: A network for controlling face generation by using images, audio, and pose codes , 2018, ECCV.

[66]  Chenliang Xu,et al.  Lip Movements Generation at a Glance , 2018, ECCV.

[67]  Andreas Rössler,et al.  FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces , 2018, ArXiv.

[68]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Yisong Yue,et al.  A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..

[70]  Hai Xuan Pham,et al.  Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[71]  Joon Son Chung,et al.  You said that? , 2017, BMVC.

[72]  Joon Son Chung,et al.  Lip Reading in the Wild , 2016, ACCV.

[73]  Joon Son Chung,et al.  Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.

[74]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[75]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[76]  Lina J. Karam,et al.  A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection , 2009, 2009 International Workshop on Quality of Multimedia Experience.

[77]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[78]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[79]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[80]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[81]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[82]  C. G. Fisher,et al.  Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.