Language Models Can See: Plugging Visual Controls in Text Generation
暂无分享,去创建一个
Dani Yogatama | Lingpeng Kong | Yan Wang | Yixuan Su | Tian Lan | Fangyu Liu | N. Collier | Yan Wang | Tian Lan | Yahui Liu | Yahui Liu
[1] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[2] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[3] Dani Yogatama,et al. A Contrastive Framework for Neural Text Generation , 2022, NeurIPS.
[4] Tiago Pimentel,et al. Typical Decoding for Natural Language Generation , 2022, ArXiv.
[5] Lior Wolf,et al. ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Xiaowei Hu,et al. Scaling Up Vision-Language Pretraining for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Zaiqiao Meng,et al. TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning , 2021, NAACL-HLT.
[8] Zaiqiao Meng,et al. Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models , 2021, ACL.
[9] Elman Mansimov,et al. Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System , 2021, ACL.
[10] Sarkar Snigdha Sarathi Das,et al. CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning , 2021, ACL.
[11] Michael Zeng,et al. DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization , 2021, AAAI.
[12] Kurt Keutzer,et al. How Much Can CLIP Benefit Vision-and-Language Tasks? , 2021, ICLR.
[13] Mohamed Elhoseiny,et al. VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[15] Yupan Huang,et al. Unifying Multimodal Transformer for Bi-directional Image and Text Generation , 2021, ACM Multimedia.
[16] Yan Wang,et al. Exploring Dense Retrieval for Dialogue Response Selection , 2021, ACM Transactions on Information Systems.
[17] Dmytro Okhonko,et al. VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding , 2021, EMNLP.
[18] Nigel Collier,et al. Plan-then-Generate: Controlled Data-to-Text Generation via Planning , 2021, EMNLP.
[19] Nigel Collier,et al. Few-Shot Table-to-Text Generation with Prototype Memory , 2021, EMNLP.
[20] Yonatan Bisk,et al. TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Jaakko Lehtinen,et al. Alias-Free Generative Adversarial Networks , 2021, NeurIPS.
[22] Yixin Liu,et al. SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization , 2021, ACL.
[23] Yongjian Wu,et al. RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yejin Choi,et al. VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Yoshitaka Ushiku,et al. Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning , 2021, EACL.
[26] Danqi Chen,et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.
[27] Ronan Le Bras,et al. CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.
[28] Nigel Collier,et al. Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders , 2021, EMNLP.
[29] Daniel Cohen-Or,et al. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Wei Liu,et al. Human-like Controllable Image Captioning with Verb-specific Semantic Roles , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[32] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[33] Yan Wang,et al. Non-Autoregressive Text Generation with Pre-trained Language Models , 2021, EACL.
[34] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[35] Yejin Choi,et al. MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , 2021, NeurIPS.
[36] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[37] Yan Wang,et al. PROTOTYPE-TO-STYLE: Dialogue Generation With Style-Aware Editing on Retrieval Memory , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[38] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[39] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[40] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[41] Runhao Zeng,et al. Dense Regression Network for Video Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[43] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[44] Tero Karras,et al. Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[46] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[47] J. Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.
[48] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[49] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.
[50] Bolei Zhou,et al. Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[52] Nassir Navab,et al. Towards Unsupervised Image Captioning With Shared Multimodal Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[53] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[54] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[55] Nenghai Yu,et al. Context and Attribute Grounded Dense Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Tae-Hyun Oh,et al. Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Xinlei Chen,et al. Grounded Video Description , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Yang Feng,et al. Unsupervised Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Jason Weston,et al. Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[62] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[63] Alexander G. Schwing,et al. Diverse and Coherent Paragraph Generation from Images , 2018, ECCV.
[64] Jiebo Luo,et al. "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention , 2018, ECCV.
[65] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[66] Mark Johnson,et al. Partially-Supervised Image Captioning , 2018, NeurIPS.
[67] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[68] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[69] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[70] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[72] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Yoshua Bengio,et al. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[74] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[76] Thomas Brox,et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.
[77] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.
[78] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[79] Lexing Xie,et al. SentiCap: Generating Image Descriptions with Sentiments , 2015, AAAI.
[80] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[81] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[82] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[83] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[84] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[85] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .
[86] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[87] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[88] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[89] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[90] Chin-Yew Lin,et al. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.
[91] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[92] J. Rosenthal,et al. Optimal scaling of discrete approximations to Langevin diffusions , 1998 .
[93] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .
[94] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .