Make-A-Video: Text-to-Video Generation without Text-Video Data
暂无分享,去创建一个
Yaniv Taigman | Devi Parikh | A. Polyak | Sonal Gupta | Songyang Zhang | Xiaoyue Yin | Qiyuan Hu | Uriel Singer | Thomas Hayes | Harry Yang | T. Hayes | Jie An | Oron Ashual | Oran Gafni | Adam Polyak
[1] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.
[2] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[3] Junyan Zhu,et al. On Aliased Resizing and Surprising Subtleties in GAN Evaluation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Wendi Zheng,et al. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers , 2022, ICLR.
[5] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[6] Jie Tang,et al. CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers , 2022, NeurIPS.
[7] Devi Parikh,et al. MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration , 2022, ECCV.
[8] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[9] David J. Fleet,et al. Video Diffusion Models , 2022, NeurIPS.
[10] Devi Parikh,et al. Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer , 2022, ECCV.
[11] Yaniv Taigman,et al. Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors , 2022, ECCV.
[12] Jinwoo Shin,et al. Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks , 2022, ICLR.
[13] B. Curless,et al. FILM: Frame Interpolation for Large Motion , 2022, ECCV.
[14] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[16] Fang Wen,et al. Vector Quantized Diffusion Model for Text-to-Image Synthesis , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Jian Liang,et al. NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion , 2021, ECCV.
[18] B. Guo,et al. Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Jing Yu Koh,et al. Vector-quantized Image Modeling with Improved VQGAN , 2021, ICLR.
[20] Guillermo Sapiro,et al. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions , 2021, ArXiv.
[21] Dimitris N. Metaxas,et al. A Good Image Generator Is What You Need for High-Resolution Video Synthesis , 2021, ICLR.
[22] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[24] Jing Yu Koh,et al. Cross-Modal Contrastive Learning for Text-to-Image Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[26] Deeptha Girish,et al. Understanding action recognition in still images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[28] Xin Wang,et al. Cross-Modal Dual Learning for Sentence-to-Video Generation , 2019, ACM Multimedia.
[29] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[30] Masanori Koyama,et al. Train Sparsely, Generate Densely: Memory-Efficient Unsupervised Training of High-Resolution Temporal GAN , 2018, International Journal of Computer Vision.
[31] Liqiang Zhang,et al. 3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks , 2018, Canadian Conference on AI.
[32] Ali Farhadi,et al. Imagine This! Scripts to Compositions to Videos , 2018, ECCV.
[33] Seunghoon Hong,et al. Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.
[35] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Tao Mei,et al. To Create What You Tell: Generating Videos from Captions , 2017, ACM Multimedia.
[37] Yitong Li,et al. Video Generation From Text , 2017, AAAI.
[38] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[39] Vineeth N. Balasubramanian,et al. Attentive Semantic Video Generation Using Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[41] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] Vineeth N. Balasubramanian,et al. Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures , 2016, ACM Multimedia.
[43] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[45] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.