Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning
暂无分享,去创建一个
Ronan Le Bras | Yejin Choi | Rowan Zellers | Jack Hessel | Youngjae Yu | Gunhee Kim | Ximing Lu | Heeseung Yun | Jiwan Chung | J. Park | Prithviraj Ammanabrolu
[1] Mohit Bansal,et al. Fine-grained Image Captioning with CLIP Reward , 2022, NAACL-HLT.
[2] Dani Yogatama,et al. Language Models Can See: Plugging Visual Controls in Text Generation , 2022, ArXiv.
[3] Marc-Alexandre Côté,et al. ScienceWorld: Is your Agent Smarter than a 5th Grader? , 2022, EMNLP.
[4] Ari S. Morcos,et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.
[5] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[6] Percy Liang,et al. Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution , 2022, ICLR.
[7] Yejin Choi,et al. MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Yejin Choi,et al. Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer , 2021, NAACL.
[9] Lior Wolf,et al. ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[11] J. Bello,et al. Wav2CLIP: Learning Robust Audio Representations from Clip , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Mark O. Riedl,et al. Situated Dialogue Learning through Procedural Environment Generation , 2021, ACL.
[13] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.
[14] Oriol Vinyals,et al. Multimodal Few-Shot Learning with Frozen Language Models , 2021, NeurIPS.
[15] Federico Raue,et al. Audioclip: Extending Clip to Image, Text and Audio , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Ali Farhadi,et al. MERLOT: Multimodal Neural Script Knowledge Models , 2021, NeurIPS.
[17] Gunhee Kim,et al. Transitional Adaptation of Pretrained Models for Visual Storytelling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Yoshitaka Ushiku,et al. Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning , 2021, EACL.
[19] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[20] Idan Schwartz. Ensemble of MRR and NDCG models for Visual Dialog , 2021, NAACL.
[21] Zhengxiao Du,et al. GPT Understands, Too , 2021, AI Open.
[22] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[23] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[24] Yejin Choi,et al. Social Chemistry 101: Learning to Reason about Social and Moral Norms , 2020, EMNLP.
[25] Leonardo Neves,et al. TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification , 2020, FINDINGS.
[26] Vicente Ordonez,et al. Visual News: Benchmark and Challenges in News Image Captioning , 2020, EMNLP.
[27] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.
[28] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[29] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Yejin Choi,et al. VisualCOMET: Reasoning About the Dynamic Context of a Still Image , 2020, ECCV.
[31] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[32] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[33] Vishvak S. Murahari,et al. Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline , 2019, ECCV.
[34] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[35] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[36] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[37] Matthew J. Hausknecht,et al. Interactive Fiction Games: A Colossal Adventure , 2019, AAAI.
[38] Yejin Choi,et al. Counterfactual Story Reasoning and Generation , 2019, EMNLP.
[39] Christopher Joseph Pal,et al. Interactive Language Learning by Question Answering , 2019, EMNLP.
[40] Nassir Navab,et al. Towards Unsupervised Image Captioning With Shared Multimodal Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[41] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[42] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.
[43] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[44] Yejin Choi,et al. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.
[45] Gunhee Kim,et al. AudioCaps: Generating Captions for Audios in The Wild , 2019, NAACL.
[46] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[47] Dan Jurafsky,et al. Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts , 2019, EMNLP.
[48] Dimosthenis Karatzas,et al. Good News, Everyone! Context Driven Entity-Aware Captioning for News Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Yang Feng,et al. Unsupervised Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Jason Weston,et al. Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] William Yang Wang,et al. WikiHow: A Large Scale Text Summarization Dataset , 2018, ArXiv.
[52] Mark O. Riedl,et al. Controllable Neural Story Plot Generation via Reinforcement Learning , 2018 .
[53] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.
[54] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[55] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[56] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[58] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[59] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[60] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Francis Ferraro,et al. Visual Storytelling , 2016, NAACL.
[63] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.
[64] S. Chopra,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[65] Lexing Xie,et al. SentiCap: Generating Image Descriptions with Sentiments , 2015, AAAI.
[66] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[67] Gunhee Kim,et al. Joint photo stream and blog post summarization and exploration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[69] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[72] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[73] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[74] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[75] Ronan Le Bras,et al. Delphi: Towards Machine Ethics and Norms , 2021, ArXiv.
[76] Yejin Choi,et al. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.
[77] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[78] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[79] Mark O. Riedl,et al. Improvisational Storytelling Agents , 2017 .
[80] Li Fei-Fei,et al. Deep visual-semantic alignments for generating image descriptions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[81] Alec Go,et al. Twitter Sentiment Classification using Distant Supervision , 2009 .
[82] Shlomo Argamon,et al. Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
[83] Jean Carletta,et al. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.