暂无分享,去创建一个
Felix Hill | Adam Santoro | David Ding | Matt Botvinick | Adam Santoro | M. Botvinick | Felix Hill | David Ding
[1] Andrew Zisserman,et al. Video Representation Learning by Dense Predictive Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[2] Pieter R. Roelfsema,et al. Object-based attention in the primary visual cortex of the macaque monkey , 1998, Nature.
[3] Tomasz Kornuta,et al. Object-Based Reasoning in VQA , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[4] Gal Chechik,et al. Learning Object Permanence from Video , 2020, ECCV.
[5] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Zhe Chen. Object-based attention: A tutorial review , 2012, Attention, Perception, & Psychophysics.
[8] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[9] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[10] Bernd Finkbeiner,et al. Teaching Temporal Logics to Neural Networks , 2020, ICLR.
[11] Gary Marcus,et al. The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence , 2020, ArXiv.
[12] Aaron van den Oord,et al. Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[15] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[16] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[17] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[18] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[19] Klaus Greff,et al. Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.
[20] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Matthew Botvinick,et al. MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.
[23] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[24] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[25] Deva Ramanan,et al. CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , 2020, ICLR.
[26] Guillaume Lample,et al. Deep Learning for Symbolic Mathematics , 2019, ICLR.
[27] Klaus Greff,et al. A Perspective on Objects and Systematic Generalization in Model-Based RL , 2019, ArXiv.
[28] Jiajun Wu,et al. Unsupervised Discovery of 3D Physical Objects from Video , 2020, ICLR.
[29] Razvan Pascanu,et al. Discovering objects and their relations from entangled scene representations , 2017, ICLR.
[30] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[31] Chuang Gan,et al. CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.
[32] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[33] Sungjin Ahn,et al. SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.
[34] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Cordelia Schmid,et al. Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.
[36] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[37] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[38] Razvan Pascanu,et al. Deep reinforcement learning with relational inductive biases , 2018, ICLR.
[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[40] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).