论文信息 - 3D-Aware Video Generation

3D-Aware Video Generation

Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.

[1] Peter Wonka,et al. 3D generation on ImageNet , 2023, ICLR.

[2] Jeong Joon Park,et al. SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jiaolong Yang,et al. AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars , 2022, NeurIPS.

[4] Peter Wonka,et al. EpiGRAF: Rethinking training of 3D GANs , 2022, NeurIPS.

[5] Andreas Geiger,et al. VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids , 2022, NeurIPS.

[6] Alexei A. Efros,et al. Generating Long Videos of Dynamic Scenes , 2022, NeurIPS.

[7] David B. Lindell,et al. 3D GAN Inversion for Controllable Portrait Image Animation , 2022, ArXiv.

[8] Jinwoo Shin,et al. Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks , 2022, ICLR.

[9] Mohamed Elhoseiny,et al. StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Jeong Joon Park,et al. StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Bolei Zhou,et al. 3D-aware Image Synthesis via Learning Structural and Textural Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Xin Tong,et al. GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Shalini De Mello,et al. Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] David B. Lindell,et al. Bacon: Band-limited Coordinate Networks for Multiscale Scene Representation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Mehdi S. M. Sajjadi,et al. Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Ronald Clark,et al. TermiNeRF: Ray Termination Prediction for Efficient Neural Rendering , 2021, 2021 International Conference on 3D Vision (3DV).

[17] Sanja Fidler,et al. EditGAN: High-Precision Semantic Image Editing , 2021, NeurIPS.

[18] Bo Dai,et al. Generative Occupancy Fields for 3D Surface-Aware Image Synthesis , 2021, NeurIPS.

[19] Christian Theobalt,et al. A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis , 2021, NeurIPS.

[20] Abhinav Shrivastava,et al. NeRV: Neural Representations for Videos , 2021, NeurIPS.

[21] Bingbing Ni,et al. CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis , 2021, ArXiv.

[22] Christian Theobalt,et al. StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis , 2021, ICLR.

[23] Thomas H. Li,et al. PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] J.-Y. Zhu,et al. Advances in Neural Rendering , 2021, SIGGRAPH Courses.

[25] Christian Theobalt,et al. StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN , 2021, BMVC.

[26] Gordon Wetzstein,et al. Fast Training of Neural Lumigraph Representations using Meta Learning , 2021, NeurIPS.

[27] Huchuan Lu,et al. Animatable Neural Radiance Fields from Monocular RGB Video , 2021, ArXiv.

[28] Jonathan T. Barron,et al. HyperNeRF , 2021, ACM Trans. Graph..

[29] Jaakko Lehtinen,et al. Alias-Free Generative Adversarial Networks , 2021, NeurIPS.

[30] Christian Theobalt,et al. Neural actor , 2021, ACM Trans. Graph..

[31] Jihwan Kim,et al. Self-Supervised Video GANs: Learning for Appearance Consistency and Motion Coherency , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Johannes Kopf,et al. Dynamic View Synthesis from Dynamic Monocular Video , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33] Dimitris N. Metaxas,et al. A Good Image Generator Is What You Need for High-Resolution Video Synthesis , 2021, ICLR.

[34] Jingyi Yu,et al. Editable free-viewpoint video using a layered neural representation , 2021, ACM Trans. Graph..

[35] Pieter Abbeel,et al. VideoGPT: Video Generation using VQ-VAE and Transformers , 2021, ArXiv.

[36] Andreas Geiger,et al. UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[37] Ming-Yu Liu,et al. GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Stephen Lin,et al. Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39] Nitish Srivastava,et al. Unconstrained Scene Generation with Locally Conditioned Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40] Andreas Geiger,et al. CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields , 2021, 2021 International Conference on 3D Vision (3DV).

[41] Hao Su,et al. GNeRF: GAN-based Neural Radiance Field without Posed Camera , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Pratul P. Srinivasan,et al. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43] Richard A. Newcombe,et al. Neural 3D Video Synthesis from Multi-view Video , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Aäron van den Oord,et al. Predicting Video with VQVAE , 2021, ArXiv.

[45] Drew A. Hudson,et al. Generative Adversarial Transformers , 2021, ICML.

[46] Alon Shoshan,et al. GAN-Control: Explicitly Controllable GANs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[47] Hujun Bao,et al. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] M. Zollhöfer,et al. Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[49] Tanner Schmidt,et al. STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in Motion with Neural Rendering , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Chia-Kai Liang,et al. Portrait Neural Radiance Fields from a Single Image , 2020, ArXiv.

[51] Jonathan T. Barron,et al. NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Justus Thies,et al. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] M. Nießner,et al. ID-Reveal: Identity-aware DeepFake Video Detection , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[54] Jiajun Wu,et al. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Richard Szeliski,et al. Animating Pictures with Eulerian Motion Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Arun Mallya,et al. One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Francesc Moreno-Noguer,et al. D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Zhengqi Li,et al. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Jonathan T. Barron,et al. Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[60] Changil Kim,et al. Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Mohamed Elhoseiny,et al. Adversarial Generation of Continuous Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Andreas Geiger,et al. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Natalie Parde,et al. Latent Neural Differential Equations for Video Generation , 2020, Preregister@NeurIPS.

[64] Simon Lucey,et al. SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images , 2020, NeurIPS.

[65] Kai Zhang,et al. NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[66] Christian Theobalt,et al. PIE , 2020, ACM Trans. Graph..

[67] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Kyaw Zaw Lin,et al. Neural Sparse Voxel Fields , 2020, NeurIPS.

[69] Christoph H. Lampert,et al. Unsupervised object-centric video generation and decomposition in 3D , 2020, NeurIPS.

[70] Andreas Geiger,et al. GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[71] Sean Franklin,et al. Deepfake Detection using Spatiotemporal Convolutional Networks , 2020, ArXiv.

[72] Evgeny Burnaev,et al. Latent Video Transformer , 2020, VISIGRAPP.

[73] Song Han,et al. Differentiable Augmentation for Data-Efficient GAN Training , 2020, NeurIPS.

[74] Gordon Wetzstein,et al. Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[75] Peter Wonka,et al. Disentangled Image Generation Through Structured Noise Injection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Thomas Brox,et al. Temporal Shift GAN for Large Scale Video Generation , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[77] Christian Theobalt,et al. StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78] Richard A. Newcombe,et al. Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[79] Ronen Basri,et al. Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[80] Amit K. Roy-Chowdhury,et al. Non-Adversarial Video Synthesis with Learned Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[82] Thomas Funkhouser,et al. Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83] Arnav Bhavsar,et al. Detecting Deepfakes with Metric Learning , 2020, 2020 8th International Workshop on Biometrics and Forensics (IWBF).

[84] Marc Pollefeys,et al. Convolutional Occupancy Networks , 2020, ECCV.

[85] Gerard Pons-Moll,et al. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[86] Nicu Sebe,et al. First Order Motion Model for Image Animation , 2020, NeurIPS.

[87] Nate Kushman,et al. Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data , 2020, ArXiv.

[88] Y. Lipman,et al. Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[89] Yong-Liang Yang,et al. BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images , 2020, NeurIPS.

[90] Seunghoon Hong,et al. High-Fidelity Synthesis with Disentangled Representation , 2020, ECCV.

[91] Philip H. S. Torr,et al. Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[92] Alexei A. Efros,et al. CNN-Generated Images Are Surprisingly Easy to Spot… for Now , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[93] Subramanian Ramamoorthy,et al. Lower Dimensional Kernels for Video Discriminators , 2019, Neural Networks.

[94] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[95] Andreas Geiger,et al. Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96] Tero Karras,et al. Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[97] Anders P. Eriksson,et al. Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[98] Stefan Roth,et al. Markov Decision Process for Video Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[99] Jeff Donahue,et al. Adversarial Video Generation on Complex Datasets , 2019 .

[100] Bolei Zhou,et al. Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[101] Jakob Uszkoreit,et al. Scaling Autoregressive Video Models , 2019, ICLR.

[102] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[103] Yaron Lipman,et al. Controlling Neural Level Sets , 2019, NeurIPS.

[104] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[105] Nicu Sebe,et al. Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[106] Thomas A. Funkhouser,et al. Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[107] Yong-Liang Yang,et al. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[108] Andreas Rössler,et al. FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[109] Vittorio Ferrari,et al. Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading , 2019, International Journal of Computer Vision.

[110] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[111] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[112] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[113] Hao Zhang,et al. Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[114] Sjoerd van Steenkiste,et al. Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.

[115] N. Mitra,et al. Escaping Plato’s Cave: 3D Shape From Adversarial Rendering , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[116] Masanori Koyama,et al. Train Sparsely, Generate Densely: Memory-Efficient Unsupervised Training of High-Resolution Temporal GAN , 2018, International Journal of Computer Vision.

[117] Edward J. Delp,et al. Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[118] Luc Van Gool,et al. Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs , 2018, ArXiv.

[119] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[120] Yong-Liang Yang,et al. RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[121] Bo Zhao,et al. Modular Generative Adversarial Networks , 2018, ECCV.

[122] Philip S. Yu,et al. An Introduction to Image Synthesis with Generative Adversarial Nets , 2018, ArXiv.

[123] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[124] Sebastian Nowozin,et al. Which Training Methods for GANs do actually Converge? , 2018, ICML.

[125] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[126] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[127] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[128] Wei Xiong,et al. Learning to Generate Time-Lapse Videos Using Multi-stage Dynamic Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[129] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[130] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[131] Ersin Yumer,et al. Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[132] Subhransu Maji,et al. 3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[133] Shunta Saito,et al. Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[134] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[135] Alex Graves,et al. Video Pixel Networks , 2016, ICML.

[136] J. Schulman,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[137] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[138] Yuting Zhang,et al. Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[139] Aaron C. Courville,et al. Generative Adversarial Nets , 2014, NIPS.

[140] Lina J. Karam,et al. A No-Reference Image Blur Metric Based on the Cumulative Probability of Blur Detection (CPBD) , 2011, IEEE Transactions on Image Processing.

[141] Binxu Wang,et al. A Geometric Analysis of Deep Generative Image Models and Its Applications , 2021, ICLR.

[142] Jiachen Yang,et al. MTD-Net: Learning to Detect Deepfakes Images by Multi-Scale Texture Difference , 2021, IEEE Transactions on Information Forensics and Security.

[143] Yu Qiao,et al. MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation , 2020, ECCV.