暂无分享,去创建一个
Francis Ferraro | Ting-Hao Huang | Lucy Vanderwende | Margaret Mitchell | Jacob Devlin | Michel Galley | Nasrin Mostafazadeh
[1] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[2] Matthias Scheutz,et al. Robust spoken instruction understanding for HRI , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[3] Luke S. Zettlemoyer,et al. A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.
[4] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[5] Raymond J. Mooney,et al. Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language , 2014, J. Artif. Intell. Res..
[6] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.
[7] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[8] Licheng Yu,et al. Visual Madlibs: Fill in the blank Image Generation and Question Answering , 2015, ArXiv.
[9] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[10] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[11] Barry K. Rosen,et al. Syntactic Complexity , 1974, Inf. Control..
[12] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.
[13] Victor H. Yngve,et al. A model and an hypothesis for language structure , 1960 .
[14] Luke S. Zettlemoyer,et al. See No Evil, Say No Evil: Description Generation from Densely Labeled Images , 2014, *SEMEVAL.
[15] Deb Roy,et al. Conversational Robots: Building Blocks for Grounding Word Meaning , 2003, HLT-NAACL 2003.
[16] Benjamin Van Durme,et al. Reporting Bias and Knowledge Extraction , 2013 .
[17] Gunhee Kim,et al. Joint photo stream and blog post summarization and exploration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[19] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[20] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[21] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[22] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] DarrellTrevor,et al. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description , 2017 .
[24] Henrik I. Christensen,et al. Situated Dialogue and Spatial Organization: What, Where… and Why? , 2007 .
[25] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Terry Winograd,et al. Understanding natural language , 1974 .
[27] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[28] Arul Menezes,et al. MindNet: An Automatically-Created Lexical Resource , 2005, HLT.
[29] Jiebo Luo,et al. Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments , 2015, HLT-NAACL.
[30] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[31] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[32] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[33] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[34] Jayant Krishnamurthy,et al. Toward Interactive Grounded Language Acqusition , 2013, Robotics: Science and Systems.
[35] Yejin Choi,et al. Déjà Image-Captions: A Corpus of Expressive Descriptions in Repetition , 2015, NAACL.
[36] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.