Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
暂无分享,去创建一个
Seung Wook Kim | S. Fidler | Huan Ling | Karsten Kreis | Tim Dockhorn | A. Blattmann | Robin Rombach
[1] Patrick Esser,et al. Structure and Content-Guided Video Synthesis with Diffusion Models , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Y. Matias,et al. Dreamix: Video Diffusion Models are General Video Editors , 2023, ArXiv.
[3] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] Bryan Catanzaro,et al. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers , 2022, ArXiv.
[5] S. Fidler,et al. LION: Latent Point Diffusion Models for 3D Shape Generation , 2022, NeurIPS.
[6] Karsten Kreis,et al. GENIE: Higher-Order Denoising Diffusion Solvers , 2022, NeurIPS.
[7] Diederik P. Kingma,et al. On Distillation of Guided Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] D. Erhan,et al. Phenaki: Variable Length Video Generation From Open Domain Textual Description , 2022, ICLR.
[9] David J. Fleet,et al. Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.
[10] Yaniv Taigman,et al. Make-A-Video: Text-to-Video Generation without Text-Video Data , 2022, ICLR.
[11] Yuanzhen Li,et al. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Amit H. Bermano,et al. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion , 2022, ICLR.
[13] J. Tenenbaum,et al. Prompt-to-Prompt Image Editing with Cross Attention Control , 2022, ICLR.
[14] Jonathan Ho. Classifier-Free Diffusion Guidance , 2022, ArXiv.
[15] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[16] Stefan Bauer,et al. Diffusion Models for Video Prediction and Infilling , 2022, Trans. Mach. Learn. Res..
[17] Alexei A. Efros,et al. Generating Long Videos of Dynamic Scenes , 2022, NeurIPS.
[18] Cheng Lu,et al. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.
[19] Sonam Gupta,et al. RV-GAN: Recurrent GAN for Unconditional Video Generation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[20] Wendi Zheng,et al. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers , 2022, ICLR.
[21] Frank Wood,et al. Flexible Diffusion Modeling of Long Videos , 2022, NeurIPS.
[22] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[23] Vikram S. Voleti,et al. MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation , 2022, ArXiv.
[24] Yongxin Chen,et al. Fast Sampling of Diffusion Models with Exponential Integrator , 2022, ICLR.
[25] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[26] Devi Parikh,et al. Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer , 2022, ECCV.
[27] David J. Fleet,et al. Video Diffusion Models , 2022, NeurIPS.
[28] S. Mandt,et al. Diffusion Probabilistic Modeling for Video Generation , 2022, Entropy.
[29] S. Ermon,et al. Dual Diffusion Implicit Bridges for Image-to-Image Translation , 2022, ICLR.
[30] Jinwoo Shin,et al. Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks , 2022, ICLR.
[31] Yi Ren,et al. Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.
[32] Mohammad Norouzi,et al. Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality , 2022, ICLR.
[33] Tim Salimans,et al. Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.
[34] Andreas Geiger,et al. StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets , 2022, SIGGRAPH.
[35] Michael Elad,et al. Denoising Diffusion Restoration Models , 2022, NeurIPS.
[36] L. Gool,et al. RePaint: Inpainting using Denoising Diffusion Probabilistic Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Bo Zhang,et al. Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models , 2022, ICLR.
[38] Mohamed Elhoseiny,et al. StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[41] Karsten Kreis,et al. Tackling the Generative Learning Trilemma with Denoising Diffusion GANs , 2021, ICLR.
[42] Karsten Kreis,et al. Score-Based Generative Modeling with Critically-Damped Langevin Diffusion , 2021, ICLR.
[43] Supasorn Suwajanakorn,et al. Diffusion Autoencoders: Toward a Meaningful and Decodable Representation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Jian Liang,et al. NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion , 2021, ECCV.
[45] B. Guo,et al. Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] David J. Fleet,et al. Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.
[47] S. Ermon,et al. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations , 2021, ICLR.
[48] David J. Fleet,et al. Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..
[49] Qi Li,et al. SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models , 2021, Neurocomputing.
[50] A. Rogozhnikov. Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation , 2022, ICLR.
[51] Christian Theobalt,et al. StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN , 2021, BMVC.
[52] Timo Milbich,et al. iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[53] Jaakko Lehtinen,et al. Alias-Free Generative Adversarial Networks , 2021, NeurIPS.
[54] Stefano Ermon,et al. D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation , 2021, NeurIPS.
[55] Jan Kautz,et al. Score-based Generative Modeling in Latent Space , 2021, NeurIPS.
[56] Tal Kachman,et al. Gotta Go Fast When Generating Data with Score-Based Models , 2021, ArXiv.
[57] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.
[58] B. Ommer,et al. Stochastic Image-to-Video Synthesis using cINNs , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Guillermo Sapiro,et al. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions , 2021, ArXiv.
[60] Dimitris N. Metaxas,et al. A Good Image Generator Is What You Need for High-Resolution Video Synthesis , 2021, ICLR.
[61] Pieter Abbeel,et al. VideoGPT: Video Generation using VQ-VAE and Transformers , 2021, ArXiv.
[62] Chris G. Willcocks,et al. UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models , 2021, ArXiv.
[63] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[64] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[65] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[66] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.
[67] Eric Luhman,et al. Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed , 2021, ArXiv.
[68] B. Ommer,et al. Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[69] Jiaming Song,et al. Denoising Diffusion Implicit Models , 2020, ICLR.
[70] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[71] Diego de Las Casas,et al. Transformation-based Adversarial Video Prediction on Large-Scale Data , 2020, ArXiv.
[72] P. Gallinari,et al. Stochastic Latent Residual Video Prediction , 2020, ICML.
[73] Subramanian Ramamoorthy,et al. Lower Dimensional Kernels for Video Discriminators , 2019, Neural Networks.
[74] A. Dantcheva,et al. G3AN: Disentangling Appearance and Motion for Video Generation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Tero Karras,et al. Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Jakob Uszkoreit,et al. Scaling Autoregressive Video Models , 2019, ICLR.
[77] Masanori Koyama,et al. Train Sparsely, Generate Densely: Memory-Efficient Unsupervised Training of High-Resolution Temporal GAN , 2018, International Journal of Computer Vision.
[78] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.
[79] Aaron C. Courville,et al. Improved Conditional VRNNs for Video Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[80] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Sjoerd van Steenkiste,et al. Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.
[82] Ali Farhadi,et al. Imagine This! Scripts to Compositions to Videos , 2018, ECCV.
[83] Sergey Levine,et al. Stochastic Adversarial Video Prediction , 2018, ArXiv.
[84] Rob Fergus,et al. Stochastic Video Generation with a Learned Prior , 2018, ICML.
[85] Sebastian Nowozin,et al. Which Training Methods for GANs do actually Converge? , 2018, ICML.
[86] Sergey Levine,et al. Stochastic Variational Video Prediction , 2017, ICLR.
[87] Yitong Li,et al. Video Generation From Text , 2017, AAAI.
[88] Tao Mei,et al. To Create What You Tell: Generating Videos from Captions , 2017, ACM Multimedia.
[89] Vineeth N. Balasubramanian,et al. Attentive Semantic Video Generation Using Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[90] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[91] Seunghoon Hong,et al. Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.
[92] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[93] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[94] Vineeth N. Balasubramanian,et al. Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures , 2016, ACM Multimedia.
[95] Shunta Saito,et al. Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[96] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[97] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.
[98] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[99] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[100] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[101] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[102] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[103] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[104] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.
[105] Siwei Lyu,et al. Interpretation and Generalization of Score Matching , 2009, UAI.
[106] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..