论文信息 - Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation - 字舞流文

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

Bolei Zhou | Hang Zhou | Wayne Wu | Xian Liu | Yinghao Xu | Qianyi Wu | Bolei Zhou | Yinghao Xu | Wayne Wu | Hang Zhou | Qianyi Wu | Xian Liu

[1] D. Ramanan,et al. Depth-supervised NeRF: Fewer Views and Faster Training for Free , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Chen Change Loy,et al. Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.

[3] Bo Dai,et al. Generative Occupancy Fields for 3D Surface-Aware Image Synthesis , 2021, NeurIPS.

[4] Haozhe Wu,et al. Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis , 2021, ACM Multimedia.

[5] Deva Ramanan,et al. NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild , 2021, NeurIPS.

[6] Hujun Bao,et al. Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Shuai Yi,et al. CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] Ravi Ramamoorthi,et al. NeLF: Neural Light-transport Field for Portrait View Synthesis and Relighting , 2021, EGSR.

[9] Changjie Fan,et al. Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion , 2021, IJCAI.

[10] Jonathan T. Barron,et al. HyperNeRF , 2021, ACM Trans. Graph..

[11] Yaron Lipman,et al. Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[12] Taku Komura,et al. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, ArXiv.

[13] Christian Theobalt,et al. Neural actor , 2021, ACM Trans. Graph..

[14] Yu Ding,et al. Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Moustafa Meshry,et al. Learned Spatial Representations for Few-shot Talking-Head Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Chen Change Loy,et al. Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Xun Cao,et al. Audio-Driven Emotional Video Portraits , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Stephen Lin,et al. Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Angela Dai,et al. NPMs: Neural Parametric Models for 3D Deformable Shapes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Stefan Leutenegger,et al. In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Pratul P. Srinivasan,et al. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] H. Bao,et al. AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Amit Raj,et al. Pixel-aligned Volumetric Avatars , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Hujun Bao,et al. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Xiaolong Wang,et al. Learning Continuous Image Representation with Local Implicit Image Function , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Justus Thies,et al. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Justus Thies,et al. Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Arun Mallya,et al. One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Francesc Moreno-Noguer,et al. D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jonathan T. Barron,et al. Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] Andreas Geiger,et al. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Yaser Sheikh,et al. Audio- and Gaze-driven Facial Animation of Codec Avatars , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33] Yun-Ta Tsai,et al. Neural Light Transport for Relighting and View Synthesis , 2020, ACM Transactions on Graphics.

[34] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Hujun Bao,et al. Animatable Neural Radiance Fields for Human Body Modeling , 2021, ArXiv.

[36] C. V. Jawahar,et al. A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild , 2020, ACM Multimedia.

[37] Dipanjan Das,et al. Speech-Driven Facial Animation Using Cascaded GANs for Learning of Motion and Texture , 2020, ECCV.

[38] Kyaw Zaw Lin,et al. Neural Sparse Voxel Fields , 2020, NeurIPS.

[39] Chenliang Xu,et al. Talking-head Generation with Rhythmic Head Motion , 2020, ECCV.

[40] Yang Zhou,et al. MakeltTalk , 2020, ACM Trans. Graph..

[41] Gordon Wetzstein,et al. Inferring Semantic Information with 3D Neural Scene Representations , 2020, ArXiv.

[42] Ronen Basri,et al. Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[43] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[44] Marc Pollefeys,et al. Convolutional Occupancy Networks , 2020, ECCV.

[45] Hujun Bao,et al. Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose , 2020, 2002.10137.

[46] Y. Lipman,et al. Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[47] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Justus Thies,et al. Neural Voice Puppetry: Audio-driven Facial Reenactment , 2019, ECCV.

[49] Yinda Zhang,et al. DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Lingyun Wu,et al. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Yi Li,et al. Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning , 2018, IJCAI.

[52] Yu Qiao,et al. MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation , 2020, ECCV.

[53] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[54] Maja Pantic,et al. Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.

[55] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[56] V. Lempitsky,et al. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Michael J. Black,et al. Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Andreas Rössler,et al. FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Hang Zhou,et al. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.

[63] Jingwen Zhu,et al. Talking Face Generation by Conditional Recurrent Adversarial Network , 2018, IJCAI.

[64] Bo Li,et al. SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[65] Andrew Zisserman,et al. X2Face: A network for controlling face generation by using images, audio, and pose codes , 2018, ECCV.

[66] Chenliang Xu,et al. Lip Movements Generation at a Glance , 2018, ECCV.

[67] Andreas Rössler,et al. FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces , 2018, ArXiv.

[68] Laurens van der Maaten,et al. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..

[70] Hai Xuan Pham,et al. Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[71] Joon Son Chung,et al. You said that? , 2017, BMVC.

[72] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.

[73] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.

[74] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[75] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[76] Lina J. Karam,et al. A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection , 2009, 2009 International Workshop on Quality of Multimedia Experience.

[77] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[78] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.

[79] Matthew Turk,et al. A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[80] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[81] Nelson L. Max,et al. Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[82] C. G. Fisher,et al. Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.