Visual Referring Expression Recognition: What Do Systems Actually Learn?
暂无分享,去创建一个
Louis-Philippe Morency | Volkan Cirik | Taylor Berg-Kirkpatrick | Louis-Philippe Morency | Volkan Cirik | Taylor Berg-Kirkpatrick
[1] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Yuandong Tian,et al. Simple Baseline for Visual Question Answering , 2015, ArXiv.
[3] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[5] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[6] Michelle X. Zhou,et al. A probabilistic approach to reference resolution in multimodal user interfaces , 2004, IUI '04.
[7] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.
[8] Trevor Darrell,et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[10] Louis-Philippe Morency,et al. Using Syntax to Ground Referring Expressions in Natural Images , 2018, AAAI.
[11] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Danqi Chen,et al. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.
[13] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[14] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[15] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[16] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[17] Changsong Liu,et al. Collaborative Effort towards Common Ground in Situated Human-Robot Dialogue , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[18] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[19] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[21] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[22] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[23] David Schlangen,et al. A simple generative model of incremental reference resolution for situated dialogue , 2017, Comput. Speech Lang..
[24] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[25] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.
[26] Joyce Yue Chai,et al. Integrating word acquisition and referential grounding towards physical world interaction , 2012, ICMI '12.
[27] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[28] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Matthias Scheutz,et al. Situated open world reference resolution for human-robot dialogue , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[30] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[31] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[32] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[33] Dhruv Batra,et al. Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.
[34] Allan Jabri,et al. Revisiting Visual Question Answering Baselines , 2016, ECCV.
[35] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.