Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
暂无分享,去创建一个
Qingming Huang | Zheng-Jun Zha | Shuhui Wang | Dechao Meng | Xuejing Liu | Liang Li | Zhengjun Zha | Qingming Huang | Xuejing Liu | Liang Li | Shuhui Wang | Dechao Meng
[1] Paul Clough,et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .
[2] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[3] Qi Wu,et al. Image Captioning and Visual Question Answering Based on Attributes and External Knowledge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] Qi Tian,et al. Structured Stochastic Recurrent Network for Linguistic Video Prediction , 2019, ACM Multimedia.
[5] Fang Zhao,et al. Weakly Supervised Phrase Localization with Multi-scale Anchored Transformer Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Hugo Jair Escalante,et al. The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..
[8] Qingming Huang,et al. Learning Hierarchical Semantic Description Via Mixed-Norm Regularization for Image Understanding , 2012, IEEE Transactions on Multimedia.
[9] Qingming Huang,et al. Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification , 2018, ACM Multimedia.
[10] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[11] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[12] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Ali Farhadi,et al. IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Shih-Fu Chang,et al. Grounding Referring Expressions in Images by Variational Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[17] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Liang Wang,et al. Referring Expression Generation and Comprehension via Attributes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[19] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[21] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[22] Gregory Shakhnarovich,et al. Comprehension-Guided Referring Expressions , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Qi Tian,et al. SkeletonNet: A Hybrid Network With a Skeleton-Embedding Process for Multi-View Image Representation Learning , 2019, IEEE Transactions on Multimedia.
[25] Kees van Deemter,et al. Generating Expressions that Refer to Visible Objects , 2013, NAACL.
[26] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Zheng-Jun Zha,et al. Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding , 2019, ACM Multimedia.
[28] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[29] Luke S. Zettlemoyer,et al. Learning Distributions over Logical Forms for Referring Expression Generation , 2013, EMNLP.
[30] Trevor Darrell,et al. Open-vocabulary Object Retrieval , 2014, Robotics: Science and Systems.
[31] Yong Jae Lee,et al. Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Licheng Yu,et al. A Joint Speaker-Listener-Reinforcer Model for Referring Expressions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Vibhav Vineet,et al. ImageSpirit: Verbal Guided Image Parsing , 2013, ACM Trans. Graph..
[34] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[35] Stefan Lee,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[36] Qingming Huang,et al. Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching , 2013, IEEE MultiMedia.
[37] Qi Tian,et al. Multimodal Similarity Gaussian Process Latent Variable Model , 2017, IEEE Transactions on Image Processing.
[38] Lin Ma,et al. Real-Time Referring Expression Comprehension by Single-Stage Grounding Network , 2018, ArXiv.
[39] Ramakant Nevatia,et al. Knowledge Aided Consistency for Weakly Supervised Phrase Grounding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Qi Tian,et al. Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval , 2018, ACM Multimedia.
[41] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Qingming Huang,et al. Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval , 2019, IEEE Transactions on Image Processing.
[43] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[44] Jivko Sinapov,et al. Guiding Interaction Behaviors for Multi-modal Grounded Language Learning , 2017, RoboNLP@ACL.
[45] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.