Text-to-Image Diffusion Models are Zero-Shot Classifiers
暂无分享,去创建一个
[1] Alexander C. Li,et al. Your Diffusion Model is Secretly a Zero-Shot Classifier , 2023, ArXiv.
[2] Jiwen Lu,et al. Unleashing Text-to-Image Diffusion Models for Visual Perception , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Sjoerd van Steenkiste,et al. Scaling Vision Transformers to 22 Billion Parameters , 2023, ICML.
[4] Nihal V. Nayak,et al. Does CLIP Bind Concepts? Probing Compositionality in Large Image Models , 2022, FINDINGS.
[5] William Yang Wang,et al. Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis , 2022, ICLR.
[6] M. Ryoo,et al. Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors , 2022, ArXiv.
[7] Alexei A. Efros,et al. Visual Prompting via Image Inpainting , 2022, NeurIPS.
[8] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[9] Emmanuel Asiedu Brempong,et al. Denoising Pretraining for Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[10] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[11] Anima Anandkumar,et al. Diffusion Models for Adversarial Purification , 2022, ICML.
[12] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[13] Trevor Darrell,et al. ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension , 2022, ACL.
[14] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[16] Huidong Liu,et al. CMA-CLIP: Cross-Modality Attention CLIP for Image-Text Classification , 2021, ArXiv.
[17] Lu Yuan,et al. Florence: A New Foundation Model for Computer Vision , 2021, ArXiv.
[18] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] David J. Fleet,et al. Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.
[20] Jenia Jitsev,et al. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs , 2021, ArXiv.
[21] Diederik P. Kingma,et al. Variational Diffusion Models , 2021, ArXiv.
[22] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] David J. Fleet,et al. Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..
[24] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[25] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[26] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[27] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.
[28] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[29] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[30] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.
[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[32] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[33] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[34] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[35] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.
[36] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[37] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[39] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[41] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[42] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..
[43] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[44] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[45] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.
[46] Stephen E. Fienberg,et al. The Comparison and Evaluation of Forecasters. , 1983 .
[47] E. Paulson. A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .
[48] Diederik P. Kingma,et al. Understanding the Diffusion Objective as a Weighted Integral of ELBOs , 2023, ArXiv.
[49] John C. Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .