Unsupervised Textual Grounding: Linking Words to Image Concepts
暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Wei Liu,et al. Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[3] Christoph H. Lampert,et al. Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[5] Luc Van Gool,et al. Fast PRISM: Branch and Bound Hough Transform for Object Class Detection , 2011, International Journal of Computer Vision.
[6] Raquel Urtasun,et al. Efficient Exact Inference for 3D Indoor Scene Understanding , 2012, ECCV.
[7] Luke S. Zettlemoyer,et al. A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.
[8] Andrew Zisserman,et al. Multiple queries for large scale specific object retrieval , 2012, BMVC.
[9] Thomas Deselaers,et al. Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.
[10] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.
[11] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[12] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.
[13] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[14] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[15] Jayant Krishnamurthy,et al. Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.
[16] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Junjie Yan,et al. The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[19] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[20] Lior Wolf,et al. Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation , 2014, ArXiv.
[21] Cordelia Schmid,et al. Spatio-temporal Object Detection Proposals , 2014, ECCV.
[22] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[23] Sanja Fidler,et al. Visual Semantic Search: Retrieving Videos via Complex Textual Queries , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[24] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[25] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[26] Trevor Darrell,et al. Open-vocabulary Object Retrieval , 2014, Robotics: Science and Systems.
[27] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[28] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[29] Svetlana Lazebnik,et al. Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.
[30] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[31] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[32] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.
[33] Ali Farhadi,et al. VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[35] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[36] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[41] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[42] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[44] Saurabh Singh,et al. Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[46] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[48] Lin Ma,et al. Learning to Answer Questions from Image Using Convolutional Neural Network , 2015, AAAI.
[49] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[53] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[54] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[55] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.
[56] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Tamir Hazan,et al. High-Order Attention Models for Visual Question Answering , 2017, NIPS.
[58] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Vahid Kazemi,et al. Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering , 2017, ArXiv.
[60] Jinjun Xiong,et al. Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts , 2018, NIPS.
[61] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).