InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
暂无分享,去创建一个
[1] Jiannan Wu,et al. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks , 2023, NeurIPS.
[2] Jianfeng Gao,et al. A Simple Framework for Open-Vocabulary Segmentation and Detection , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Shalini De Mello,et al. Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Yong Jae Lee,et al. Generalized Decoding for Pixel, Image, and Language , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Chunhua Shen,et al. Images Speak in Images: A Generalist Painter for In-Context Visual Learning , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Alexei A. Efros,et al. InstructPix2Pix: Learning to Follow Image Editing Instructions , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Hongsheng Li,et al. Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] P. Luo,et al. DiffusionDet: Diffusion Model for Object Detection , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Yaniv Taigman,et al. Make-A-Video: Text-to-Video Generation without Text-Video Data , 2022, ICLR.
[10] Alexei A. Efros,et al. Visual Prompting via Image Inpainting , 2022, NeurIPS.
[11] Li Dong,et al. Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks , 2022, ArXiv.
[12] Jonathan Ho. Classifier-Free Diffusion Guidance , 2022, ArXiv.
[13] Aniruddha Kembhavi,et al. Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks , 2022, ICLR.
[14] David J. Fleet,et al. A Unified Sequence Interface for Vision Tasks , 2022, NeurIPS.
[15] Tero Karras,et al. Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.
[16] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[17] André Susano Pinto,et al. UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes , 2022, NeurIPS.
[18] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[19] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[20] Junjun Jiang,et al. BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation , 2022, ArXiv.
[21] Xianming Liu,et al. DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation , 2022, Machine Intelligence Research.
[22] P. Battaglia,et al. Transframer: Arbitrary Frame Prediction with Generative Models , 2022, Trans. Mach. Learn. Res..
[23] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[24] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[26] A. Schwing,et al. Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Xizhou Zhu,et al. Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Jong-Chul Ye,et al. CLIPstyler: Image Style Transfer with a Single Text Condition , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Lu Yuan,et al. Florence: A New Foundation Model for Computer Vision , 2021, ArXiv.
[30] David J. Fleet,et al. Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.
[31] David J. Fleet,et al. Pix2seq: A Language Modeling Framework for Object Detection , 2021, ICLR.
[32] David J. Fleet,et al. Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..
[33] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.
[34] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] B. Ommer,et al. Taming Transformers for High-Resolution Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.
[37] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[38] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[39] Chi-Keung Tang,et al. FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.
[41] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[43] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[44] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[46] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[47] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[48] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.
[49] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[50] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[51] Michael I. Jordan,et al. AUTO-ENCODING VARIATIONAL BAYES , 2020 .
[52] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .