The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion
暂无分享,去创建一个
[1] Y. Matias,et al. Dreamix: Video Diffusion Models are General Video Editors , 2023, ArXiv.
[2] Yang Zhang,et al. Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] P. Schramowski,et al. The Stable Artist: Steering Semantics in Diffusion Latent Space , 2022, ArXiv.
[4] D. Erhan,et al. Phenaki: Variable Length Video Generation From Open Domain Textual Description , 2022, ICLR.
[5] David J. Fleet,et al. Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.
[6] Yaniv Taigman,et al. Make-A-Video: Text-to-Video Generation without Text-Video Data , 2022, ICLR.
[7] J. Tenenbaum,et al. Prompt-to-Prompt Image Editing with Cross Attention Control , 2022, ICLR.
[8] Dani Lischinski,et al. Blended Latent Diffusion , 2022, ACM Trans. Graph..
[9] J. Tenenbaum,et al. Compositional Visual Generation with Composable Diffusion Models , 2022, ECCV.
[10] Xiaoguang Han,et al. Expressive Talking Head Generation with Granular Audio-Visual Control , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Wayne Wu,et al. EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model , 2022, SIGGRAPH.
[12] Sang Ho Yoon,et al. Sound-Guided Semantic Video Generation , 2022, ECCV.
[13] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[14] David J. Fleet,et al. Video Diffusion Models , 2022, NeurIPS.
[15] Mohamed Elhoseiny,et al. StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Wonmin Byeon,et al. Sound-Guided Semantic Image Manipulation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Cordelia Schmid,et al. CCVS: Context-aware Controllable Video Synthesis , 2021, NeurIPS.
[19] Jaakko Lehtinen,et al. Alias-Free Generative Adversarial Networks , 2021, NeurIPS.
[20] Xun Cao,et al. Audio-Driven Emotional Video Portraits , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Mohamed Elhoseiny,et al. Aligning Latent and Image Spaces to Connect the Unconnectable , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[23] Dasaem Jeong,et al. TräumerAI: Dreaming Music with StyleGAN , 2021, ArXiv.
[24] C. V. Jawahar,et al. A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild , 2020, ACM Multimedia.
[25] Anoop Cherian,et al. Sound2Sight: Generating Visual Dynamics from Sound and Context , 2020, ECCV.
[26] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Jun Zhu,et al. Automatic Realistic Music Video Generation from Segments of Youtube Videos , 2019, ArXiv.
[28] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[29] Sjoerd van Steenkiste,et al. Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.
[30] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[31] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[33] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[36] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[37] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.
[38] A. Linear-probe,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021 .