Multimodal Knowledge Alignment with Reinforcement Learning
暂无分享,去创建一个
Ronan Le Bras | Yejin Choi | Rowan Zellers | Jack Hessel | Prithviraj Ammanabrolu | Youngjae Yu | Gunhee Kim | Ximing Lu | Heeseung Yun | Jiwan Chung | J. Park
[1] Mohit Bansal,et al. Fine-grained Image Captioning with CLIP Reward , 2022, NAACL-HLT.
[2] Yejin Choi,et al. Aligning to Social Norms and Values in Interactive Narratives , 2022, NAACL.
[3] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[4] Marc-Alexandre Côté,et al. ScienceWorld: Is your Agent Smarter than a 5th Grader? , 2022, EMNLP.
[5] Ari S. Morcos,et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.
[6] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[7] Percy Liang,et al. Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution , 2022, ICLR.
[8] Yejin Choi,et al. MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Yejin Choi,et al. Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer , 2021, NAACL.
[10] Lior Wolf,et al. ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[12] D. Song,et al. What Would Jiminy Cricket Do? Towards Agents That Behave Morally , 2021, NeurIPS Datasets and Benchmarks.
[13] J. Bello,et al. Wav2CLIP: Learning Robust Audio Representations from Clip , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Mark O. Riedl,et al. Situated Dialogue Learning through Procedural Environment Generation , 2021, ACL.
[15] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.
[16] Oriol Vinyals,et al. Multimodal Few-Shot Learning with Frozen Language Models , 2021, NeurIPS.
[17] Federico Raue,et al. Audioclip: Extending Clip to Image, Text and Audio , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Ali Farhadi,et al. MERLOT: Multimodal Neural Script Knowledge Models , 2021, NeurIPS.
[19] Gunhee Kim,et al. Transitional Adaptation of Pretrained Models for Visual Storytelling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Yoshitaka Ushiku,et al. Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning , 2021, EACL.
[21] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[22] Idan Schwartz. Ensemble of MRR and NDCG models for Visual Dialog , 2021, NAACL.
[23] Zhilin Yang,et al. GPT Understands, Too , 2021, AI Open.
[24] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[25] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[26] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[27] Leonardo Neves,et al. TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification , 2020, FINDINGS.
[28] Vicente Ordonez,et al. Visual News: Benchmark and Challenges in News Image Captioning , 2020, EMNLP.
[29] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.
[30] Tom B. Brown,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[31] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Li Dong,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[33] Mark O. Riedl,et al. Learning Norms from Stories: A Prior for Value Aligned Agents , 2019, AIES.
[34] Vishvak S. Murahari,et al. Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline , 2019, ECCV.
[35] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[36] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[37] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[38] Matthew J. Hausknecht,et al. Interactive Fiction Games: A Colossal Adventure , 2019, AAAI.
[39] Yejin Choi,et al. Counterfactual Story Reasoning and Generation , 2019, EMNLP.
[40] Christopher Joseph Pal,et al. Interactive Language Learning by Question Answering , 2019, EMNLP.
[41] Nassir Navab,et al. Towards Unsupervised Image Captioning With Shared Multimodal Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[43] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.
[44] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[45] Gunhee Kim,et al. AudioCaps: Generating Captions for Audios in The Wild , 2019, NAACL.
[46] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[47] Dan Jurafsky,et al. Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts , 2019, EMNLP.
[48] Dimosthenis Karatzas,et al. Good News, Everyone! Context Driven Entity-Aware Captioning for News Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Yang Feng,et al. Unsupervised Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Jason Weston,et al. Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] William Yang Wang,et al. WikiHow: A Large Scale Text Summarization Dataset , 2018, ArXiv.
[52] Mark O. Riedl,et al. Controllable Neural Story Plot Generation via Reinforcement Learning , 2018 .
[53] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[54] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[55] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[57] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[59] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Francis Ferraro,et al. Visual Storytelling , 2016, NAACL.
[62] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.
[63] S. Chopra,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[64] Lexing Xie,et al. SentiCap: Generating Image Descriptions with Sentiments , 2015, AAAI.
[65] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[66] Gunhee Kim,et al. Joint photo stream and blog post summarization and exploration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[68] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[69] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[72] A. Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[73] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[74] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[75] Luca Paolini,et al. Models , 2021, Encyclopedia of Gerontology and Population Aging.
[76] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[77] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[78] Mark O. Riedl,et al. Improvisational Storytelling Agents , 2017 .
[79] Alec Go,et al. Twitter Sentiment Classification using Distant Supervision , 2009 .
[80] Shlomo Argamon,et al. Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.