ControlVideo: Training-free Controllable Text-to-Video Generation
暂无分享,去创建一个
W. Zuo | Xiaopeng Zhang | Yuxiang Wei | Dongsheng Jiang | Qi Tian | Xiaopeng Zhang | Qi Tian | Yabo Zhang | Yabo Zhang | Yuxiang Wei | Wangmeng Zuo
[1] Xintao Wang,et al. Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos , 2023, AAAI.
[2] Humphrey Shi,et al. Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Sylvain Paris,et al. Scaling up GANs for Text-to-Image Synthesis , 2023, ArXiv.
[4] Lei Zhang,et al. ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation , 2023, ArXiv.
[5] Xintao Wang,et al. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models , 2023, AAAI.
[6] Maneesh Agrawala,et al. Adding Conditional Control to Text-to-Image Diffusion Models , 2023, ArXiv.
[7] Patrick Esser,et al. Structure and Content-Guided Video Synthesis with Diffusion Models , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Andreas Geiger,et al. StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis , 2023, ICML.
[9] W. Freeman,et al. Muse: Text-To-Image Generation via Masked Generative Transformers , 2023, ICML.
[10] Mike Zheng Shou,et al. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] Nupur Kumari,et al. Multi-Concept Customization of Text-to-Image Diffusion , 2022, ArXiv.
[12] Bryan Catanzaro,et al. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers , 2022, ArXiv.
[13] W. Zuo,et al. ImaginaryNet: Learning Object Detectors without Real Images and Annotations , 2022, ArXiv.
[14] D. Erhan,et al. Phenaki: Variable Length Video Generation From Open Domain Textual Description , 2022, ICLR.
[15] David J. Fleet,et al. Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.
[16] Yaniv Taigman,et al. Make-A-Video: Text-to-Video Generation without Text-Video Data , 2022, ICLR.
[17] Yuanzhen Li,et al. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Amit H. Bermano,et al. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion , 2022, ICLR.
[19] J. Tenenbaum,et al. Prompt-to-Prompt Image Editing with Cross Attention Control , 2022, ICLR.
[20] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[21] Wendi Zheng,et al. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers , 2022, ICLR.
[22] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[23] Jie Tang,et al. CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers , 2022, NeurIPS.
[24] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[25] David J. Fleet,et al. Video Diffusion Models , 2022, NeurIPS.
[26] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[28] Jian Liang,et al. NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion , 2021, ECCV.
[29] B. Guo,et al. Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Konrad Schindler,et al. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Grzegorz Sarwas,et al. FastRIFE: Optimization of Real-Time Intermediate Flow Estimation for Video Frame Interpolation , 2021, J. WSCG.
[32] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[33] Guillermo Sapiro,et al. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions , 2021, ArXiv.
[34] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[35] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[36] Jiaming Song,et al. Denoising Diffusion Implicit Models , 2020, ICLR.
[37] Stefano Ermon,et al. SDEdit: Image Synthesis and Editing with Stochastic Differential Equations , 2021, ArXiv.
[38] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[39] Luc Van Gool,et al. The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.
[40] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.