MSRC: multimodal spatial regression with semantic context for phrase grounding
暂无分享,去创建一个
Ramakant Nevatia | Jiyang Gao | Kan Chen | Rama Kovvuri | R. Nevatia | Rama Kovvuri | J. Gao | Kan Chen
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .
[3] Ivan Laptev,et al. ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.
[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[5] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[7] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[9] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[10] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.
[11] Luke S. Zettlemoyer,et al. A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[14] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[16] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[17] Wei Xu,et al. ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.
[18] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[19] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[20] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[21] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[23] Jayant Krishnamurthy,et al. Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.
[24] Ondrej Chum,et al. CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.
[25] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[26] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[28] ZissermanAndrew,et al. The Pascal Visual Object Classes Challenge , 2015 .
[29] Rada Mihalcea,et al. Structured Matching for Phrase Localization , 2016, ECCV.
[30] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[31] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.
[33] Р Ю Чуйков,et al. Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .
[34] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[35] Kan Chen,et al. AMC: Attention Guided Multi-modal Correlation Learning for Image Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[37] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.