Situation Recognition: Visual Semantic Role Labeling for Image Understanding
暂无分享,去创建一个
[1] Ioannis L. Kotsiopoulos. Language Semantics , 2002, ICEIMT.
[2] Martha Palmer,et al. From TreeBank to PropBank , 2002, LREC.
[3] Christopher R. Johnson,et al. Background to Framenet , 2003 .
[4] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .
[5] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.
[6] Andrea Vedaldi,et al. Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[7] Fei-Fei Li,et al. What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[8] Larry S. Davis,et al. Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.
[9] Alexei A. Efros,et al. An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[10] Cordelia Schmid,et al. Actions in context , 2009, CVPR.
[11] Serge J. Belongie,et al. Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..
[12] Ivan Laptev,et al. Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.
[13] Fei-Fei Li,et al. Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[14] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[15] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[16] Subhransu Maji,et al. Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.
[17] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[18] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.
[19] Carina Silberer,et al. Grounded Models of Semantic Representation , 2012, EMNLP.
[20] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[21] Dipanjan Das,et al. Semi-Supervised and Latent-Variable Models of Natural Language Semantics , 2012 .
[22] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[23] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[24] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[25] Yoav Goldberg,et al. A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books , 2013, *SEMEVAL.
[26] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Guodong Guo,et al. A survey on still image based human action recognition , 2014, Pattern Recognit..
[28] Raffaella Bernardi,et al. TUHOI: Trento Universal Human Object Interaction Dataset , 2014, VL@COLING.
[29] Frank Keller,et al. Comparing Automatic Evaluation Measures for Image Description , 2014, ACL.
[30] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.
[31] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[32] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[33] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[34] Luke S. Zettlemoyer,et al. See No Evil, Say No Evil: Description Generation from Densely Labeled Images , 2014, *SEMEVAL.
[35] Li Fei-Fei,et al. Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.
[36] Xinlei Chen,et al. Learning a Recurrent Visual Representation for Image Caption Generation , 2014, ArXiv.
[37] Jitendra Malik,et al. Visual Semantic Role Labeling , 2015, ArXiv.
[38] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[41] Pietro Perona,et al. Describing Common Human Visual Actions in Images , 2015, BMVC.
[42] Christopher Ré,et al. Building a Large-scale Multimodal Knowledge Base for Visual Question Answering , 2015, ArXiv.
[43] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[44] Licheng Yu,et al. Visual Madlibs: Fill in the blank Image Generation and Question Answering , 2015, ArXiv.
[45] Kuzman Ganchev,et al. Semantic Role Labeling with Neural Network Factors , 2015, EMNLP.
[46] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Richard S. Zemel,et al. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset , 2015, ArXiv.
[48] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[49] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[50] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.