Learning to Learn Words from Visual Scenes
暂无分享,去创建一个
Heng Ji | Shih-Fu Chang | Carl Vondrick | Dídac Surís | Dave Epstein | Shih-Fu Chang | Carl Vondrick | Dídac Surís | Heng Ji | Dave Epstein
[1] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[2] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[3] Tao Mei,et al. Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Yizhou Sun,et al. Few-Shot Representation Learning for Out-Of-Vocabulary Words , 2019, ACL.
[5] Angeliki Lazaridou,et al. Multimodal Word Meaning Induction From Minimal Exposure to Natural Text. , 2017, Cognitive science.
[6] Hinrich Schütze,et al. Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts , 2019, NAACL-HLT.
[7] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .
[8] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.
[9] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[10] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .
[12] Noah D. Goodman,et al. Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.
[13] Brenden M. Lake,et al. Mutual exclusivity as a challenge for deep neural networks , 2019, NeurIPS.
[14] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.
[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[16] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] David Reitter,et al. Fusion of Detected Objects in Text for Visual Question Answering , 2019, EMNLP.
[18] Yin Li,et al. Compositional Learning for Human Object Interaction , 2018, ECCV.
[19] Brenden M. Lake,et al. Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.
[20] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.
[21] Brenden M. Lake,et al. Mutual exclusivity as a challenge for neural networks , 2019, ArXiv.
[22] Chuang Gan,et al. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.
[23] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[24] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[25] Trevor Darrell,et al. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Bernt Schiele,et al. Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[28] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[29] Tao Mei,et al. Pointing Novel Objects in Image Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Graham Neubig,et al. Cross-Lingual Word Embeddings for Low-Resource Language Modeling , 2017, EACL.
[32] Martial Hebert,et al. From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[34] Jianwei Yang,et al. Neural Baby Talk , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Louis-Philippe Morency,et al. M-BERT: Injecting Multimodal Information in the BERT Structure , 2019, ArXiv.
[36] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Licheng Yu,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ArXiv.
[38] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[39] Dima Damen,et al. Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Jacob Eisenstein,et al. Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.
[41] Allyson Ettinger,et al. Assessing Composition in Sentence Vector Representations , 2018, COLING.
[42] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.
[43] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[44] P. Jusczyk,et al. Some Beginnings of Word Comprehension in 6-Month-Olds , 1999 .
[45] Yi Yang,et al. Decoupled Novel Object Captioner , 2018, ACM Multimedia.
[46] Hinrich Schütze,et al. Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking , 2019, AAAI.
[47] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[48] Samuel R. Bowman,et al. Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark , 2019, ACL.
[49] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[50] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.
[51] Mikhail Khodak,et al. A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors , 2018, ACL.
[52] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[53] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.
[54] Cordelia Schmid,et al. Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.
[55] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[56] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[57] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[58] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[59] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.
[60] Pieter Abbeel,et al. Meta-Learning with Temporal Convolutions , 2017, ArXiv.
[61] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[62] Desmond Elliott,et al. Compositional Generalization in Image Captioning , 2019, CoNLL.
[63] Kristen Grauman,et al. Attributes as Operators , 2018, ECCV.
[64] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[65] Marco Baroni,et al. High-risk learning: acquiring new word vectors from tiny data , 2017, EMNLP.
[66] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).