NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
暂无分享,去创建一个
Fan Yang | Jianlong Fu | Huan Yang | Chenfei Wu | Lijuan Wang | Zicheng Liu | Houqiang Li | Linjie Li | Jianfeng Wang | Zhengyuan Yang | Nan Duan | Minheng Ni | Xiaodong Wang | Xiaodong Wang | Shuguang Liu | Sheng-Siang Yin | Gong Ming
[1] Jiashi Feng,et al. MagicVideo: Efficient Video Generation With Latent Diffusion Models , 2022, ArXiv.
[2] D. Erhan,et al. Phenaki: Variable Length Video Generation From Open Domain Textual Description , 2022, ICLR.
[3] Yaniv Taigman,et al. Make-A-Video: Text-to-Video Generation without Text-Video Data , 2022, ICLR.
[4] Zhe Gan,et al. NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis , 2022, NeurIPS.
[5] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[6] Wendi Zheng,et al. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers , 2022, ICLR.
[7] Zhe Gan,et al. GIT: A Generative Image-to-text Transformer for Vision and Language , 2022, Trans. Mach. Learn. Res..
[8] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[9] Frank Wood,et al. Flexible Diffusion Modeling of Long Videos , 2022, NeurIPS.
[10] Vikram S. Voleti,et al. MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation , 2022, ArXiv.
[11] Jie Tang,et al. CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers , 2022, NeurIPS.
[12] Devi Parikh,et al. Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer , 2022, ECCV.
[13] David J. Fleet,et al. Video Diffusion Models , 2022, NeurIPS.
[14] S. Mandt,et al. Diffusion Probabilistic Modeling for Video Generation , 2022, Entropy.
[15] W. Freeman,et al. MaskGIT: Masked Generative Image Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Jian Liang,et al. NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion , 2021, ECCV.
[18] Qifeng Chen,et al. Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths , 2022, ArXiv.
[19] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[20] Guillermo Sapiro,et al. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions , 2021, ArXiv.
[21] Pieter Abbeel,et al. VideoGPT: Video Generation using VQ-VAE and Transformers , 2021, ArXiv.
[22] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[23] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[24] Sjoerd van Steenkiste,et al. Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.
[25] Yitong Li,et al. Video Generation From Text , 2017, AAAI.
[26] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Tao Mei,et al. To Create What You Tell: Generating Videos from Captions , 2017, ACM Multimedia.
[28] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] Vineeth N. Balasubramanian,et al. Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures , 2016, ACM Multimedia.
[31] Shunta Saito,et al. Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.