Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
暂无分享,去创建一个
[1] Tamara L. Berg,et al. Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..
[2] David Schlangen,et al. Incremental Reference Resolution: The Task, Metrics for Evaluation, and a Bayesian Filtering Model that is Sensitive to Disfluencies , 2009, SIGDIAL Conference.
[3] Jeffrey Nichols,et al. Interpreting Written How-To Instructions , 2009, IJCAI.
[4] Larry S. Davis,et al. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Tat-Seng Chua,et al. An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.
[6] Jean Ponce,et al. Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[7] Cyrus Rashtchian,et al. Cross-Caption Coreference Resolution for Automatic Image Understanding , 2010, CoNLL.
[8] Stefanie Tellex,et al. Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[9] Heeyoung Lee,et al. Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.
[10] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[11] Fei-Fei Li,et al. Video Event Understanding Using Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[12] Dan Klein,et al. Easy Victories and Uphill Battles in Coreference Resolution , 2013, EMNLP.
[13] Stefanie Tellex,et al. Learning perceptually grounded word meanings from unaligned parallel data , 2012, Machine Learning.
[14] Cordelia Schmid,et al. Finding Actors and Actions in Movies , 2013, 2013 IEEE International Conference on Computer Vision.
[15] Sanja Fidler,et al. A Sentence Is Worth a Thousand Pixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[16] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[17] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Jayant Krishnamurthy,et al. Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.
[19] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[20] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[21] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[22] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[23] Earl J. Wagner,et al. Cooking with Semantics , 2014, ACL 2014.
[24] Sanja Fidler,et al. Visual Semantic Search: Retrieving Videos via Complex Textual Queries , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[25] Deva Ramanan,et al. Parsing Videos of Actions with Segmental Grammars , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[26] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[27] Jonas Kuhn,et al. Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features , 2014, ACL.
[28] Fei-Fei Li,et al. Linking People in Videos with "Their" Names Using Coreference Resolution , 2014, ECCV.
[29] Alexander G. Hauptmann,et al. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.
[30] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[31] Shinsuke Mori,et al. A Framework for Procedural Text Understanding , 2015, IWPT.
[32] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[33] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[34] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[35] Wei Chen,et al. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.
[36] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.
[37] Nizar Habash,et al. Predicting the Structure of Cooking Recipes , 2015, EMNLP.
[38] Jiebo Luo,et al. Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments , 2015, HLT-NAACL.
[39] Cordelia Schmid,et al. Weakly-Supervised Alignment of Video with Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[41] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[42] Silvio Savarese,et al. Unsupervised Semantic Parsing of Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[43] Michael Strube,et al. Latent Structures for Coreference Resolution , 2015, TACL.
[44] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Dan Klein,et al. Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.
[46] Lei Chen,et al. Deep Structured Models For Group Activity Recognition , 2015, BMVC.
[47] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Yejin Choi,et al. Mise en Place: Unsupervised Interpretation of Instructional Recipes , 2015, EMNLP.
[49] Ali Farhadi,et al. Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off! , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[50] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[51] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[52] Xinlei Chen,et al. Learning Visual Storylines with Skipping Recurrent Neural Networks , 2016, ECCV.
[53] Dhruv Batra,et al. Sort Story: Sorting Jumbled Images and Captions into Stories , 2016, EMNLP.
[54] Greg Mori,et al. Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Percy Liang,et al. Simpler Context-Dependent Logical Forms via Model Projections , 2016, ACL.
[56] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[57] Francis Ferraro,et al. Visual Storytelling , 2016, NAACL.
[58] Song-Chun Zhu,et al. Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration , 2016, EMNLP.
[59] Changsong Liu,et al. Grounded Semantic Role Labeling , 2016, NAACL.
[60] Song-Chun Zhu,et al. Robot learning with a spatial, temporal, and causal and-or graph , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[61] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Ivan Laptev,et al. Unsupervised Learning from Narrated Instruction Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Ivan Laptev,et al. Joint Discovery of Object States and Manipulation Actions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[66] Trevor Darrell,et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).