Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
暂无分享,去创建一个
Ramakant Nevatia | Jiyang Gao | Kan Chen | R. Nevatia | J. Gao | Kan Chen
[1] Wei Xu,et al. ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.
[2] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[4] Daniel P. Huttenlocher,et al. Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.
[5] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[6] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[8] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .
[9] Cordelia Schmid,et al. Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[10] Kan Chen,et al. AMC: Attention Guided Multi-modal Correlation Learning for Image Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[12] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[14] Yong Jae Lee,et al. Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[16] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[17] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[20] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[21] Ramakant Nevatia,et al. Motion-Appearance Co-memory Networks for Video Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[23] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[24] Ramakant Nevatia,et al. MSRC: multimodal spatial regression with semantic context for phrase grounding , 2017, International Journal of Multimedia Information Retrieval.
[25] Shih-Fu Chang,et al. Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Xiao Lin,et al. Leveraging Visual Question Answering for Image-Caption Ranking , 2016, ECCV.
[28] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[29] Trevor Darrell,et al. Open-vocabulary Object Retrieval , 2014, Robotics: Science and Systems.
[30] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.
[31] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[32] Chen Sun,et al. VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[37] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[39] ZissermanAndrew,et al. The Pascal Visual Object Classes Challenge , 2015 .
[40] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.