GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
暂无分享,去创建一个
Ran Xu | Zeyuan Chen | Ning Yu | Caiming Xiong | Shu Zhang | Chen Xing | Can Qin | Yun Fu | Stefano Ermon
[1] Wenqi Shao,et al. Align, Adapt and Inject: Sound-guided Unified Image Generation , 2023, ArXiv.
[2] Hubert P. H. Shum,et al. On the Design Fundamentals of Diffusion Models: A Survey , 2023, ArXiv.
[3] Xintao Wang,et al. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models , 2023, AAAI.
[4] Maneesh Agrawala,et al. Adding Conditional Control to Text-to-Image Diffusion Models , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Juan Carlos Niebles,et al. ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Alexei A. Efros,et al. InstructPix2Pix: Learning to Follow Image Editing Instructions , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Ledell Yu Wu,et al. AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities , 2022, ACL.
[8] Bryan Catanzaro,et al. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers , 2022, ArXiv.
[9] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.
[10] Radu Tudor Ionescu,et al. Diffusion Models in Vision: A Survey , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Jonathan Ho. Classifier-Free Diffusion Guidance , 2022, ArXiv.
[12] Zhe Gan,et al. NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis , 2022, NeurIPS.
[13] Doris Y. Tsao,et al. On the principles of Parsimony and Self-consistency for the emergence of intelligence , 2022, Frontiers of Information Technology & Electronic Engineering.
[14] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[15] Ashish V. Thapliyal,et al. Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset , 2022, EMNLP.
[16] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[17] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[18] Tristan Thrush,et al. Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Yaniv Taigman,et al. Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors , 2022, ECCV.
[20] Yi Ren,et al. Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.
[21] Y. Fu,et al. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework , 2022, ICLR.
[22] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[23] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[25] Wonmin Byeon,et al. Sound-Guided Semantic Image Manipulation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Ruiyi Zhang,et al. Towards Language-Free Training for Text-to-Image Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Jian Liang,et al. NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion , 2021, ECCV.
[28] Jenia Jitsev,et al. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs , 2021, ArXiv.
[29] Jing Yu Koh,et al. Vector-quantized Image Modeling with Improved VQGAN , 2021, ICLR.
[30] Federico Raue,et al. Audioclip: Extending Clip to Image, Text and Audio , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[32] Stefano Ermon,et al. D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation , 2021, NeurIPS.
[33] Jan Kautz,et al. Score-based Generative Modeling in Latent Space , 2021, NeurIPS.
[34] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[35] A. Dosovitskiy,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[36] Andreas Dengel,et al. ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).
[37] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[38] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[39] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.
[40] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..
[41] Xiaoyuan Jing,et al. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[43] Qingming Huang,et al. Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[45] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[46] C.-C. Jay Kuo,et al. PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation , 2019, NeurIPS.
[47] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[48] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[49] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[50] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.
[51] Holger Schwenk,et al. WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.
[52] Jianmin Wang,et al. Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation , 2019, ICML.
[53] Nenghai Yu,et al. Semantics Disentangling for Text-To-Image Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Wei Chen,et al. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Kyung-Ah Sohn,et al. Fast, Accurate, and, Lightweight Super-Resolution with Cascading Residual Network , 2018, ECCV.
[56] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[57] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[58] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[59] Kyoung Mu Lee,et al. Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[60] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[61] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[62] Michael I. Jordan,et al. Conditional Adversarial Domain Adaptation , 2017, NeurIPS.
[63] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learned Compression of Images and Neural Networks , 2017, ArXiv.
[64] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[65] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.
[66] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[67] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[68] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[69] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[70] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[71] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[72] WU KarenT,et al. Results , 1969 .
[73] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[74] Jinsung Yoon,et al. GENERATIVE ADVERSARIAL NETS , 2018 .
[75] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .