Language-Aware Fine-Grained Object Representation for Referring Expression Comprehension
暂无分享,去创建一个
King Ngi Ngan | Hengcan Shi | Qingbo Wu | Fanman Meng | Hongliang Li | Heqian Qiu | Taijin Zhao | Q. Wu | Hongliang Li | K. Ngan | Heqian Qiu | Fanman Meng | Hengcan Shi | Taijin Zhao
[1] Trevor Darrell,et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[3] Kan Chen,et al. Zero-Shot Grounding of Objects From Natural Language Queries , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Qi Wu,et al. Visual Grounding via Accumulated Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] King Ngi Ngan,et al. A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images , 2019, Remote. Sens..
[6] Jiebo Luo,et al. A Fast and Accurate One-Stage Approach to Visual Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] King Ngi Ngan,et al. Hierarchical Context Features Embedding for Object Detection , 2020, IEEE Transactions on Multimedia.
[10] Hanwang Zhang,et al. Learning to Assemble Neural Module Tree Networks for Visual Grounding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] Gregory Shakhnarovich,et al. Comprehension-Guided Referring Expressions , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Licheng Yu,et al. A Joint Speaker-Listener-Reinforcer Model for Referring Expressions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Xu Sun,et al. Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations , 2019, NeurIPS.
[14] Hongliang Li,et al. Key-Word-Aware Network for Referring Expression Image Segmentation , 2018, ECCV.
[15] Hanqing Lu,et al. Aligning Linguistic Words and Visual Semantic Units for Image Captioning , 2019, ACM Multimedia.
[16] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[17] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[18] Qi Tian,et al. CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[20] King Ngi Ngan,et al. Query Reconstruction Network for Referring Expression Image Segmentation , 2021, IEEE Transactions on Multimedia.
[21] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[22] Chen Qian,et al. A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Shih-Fu Chang,et al. Grounding Referring Expressions in Images by Variational Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Zhou Zhao,et al. Multi-interaction Network with Object Relation for Video Question Answering , 2019, ACM Multimedia.
[25] Hengcan Shi,et al. Offset Bin Classification Network for Accurate Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[27] Richang Hong,et al. Learning to Compose and Reason with Language Tree Structures for Visual Grounding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[29] Tieniu Tan,et al. Hierarchical Memory Modelling for Video Captioning , 2018, ACM Multimedia.
[30] Chao Zhang,et al. Referring Expression Comprehension with Semantic Visual Relationship and Word Mapping , 2019, ACM Multimedia.
[31] Yongdong Zhang,et al. Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching , 2019, ACM Multimedia.
[32] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[33] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[34] Meng Wang,et al. Question-Aware Tube-Switch Network for Video Question Answering , 2019, ACM Multimedia.
[35] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Stephen Lin,et al. RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Lianli Gao,et al. Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.
[39] Qi Wu,et al. Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.
[41] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[42] Zi Huang,et al. Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation , 2019, ACM Multimedia.
[43] Yahong Han,et al. Explore Multi-Step Reasoning in Video Question Answering , 2018, CoVieW@MM.
[44] Shuqiang Jiang,et al. Attention-based Densely Connected LSTM for Video Captioning , 2019, ACM Multimedia.
[45] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[46] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Jitao Sang,et al. Explainable Interaction-driven User Modeling over Knowledge Graph for Sequential Recommendation , 2019, ACM Multimedia.
[48] Xiaojuan Qi,et al. Self-boosted Gesture Interactive System with ST-Net , 2018, ACM Multimedia.
[49] Liang Wang,et al. Referring Expression Generation and Comprehension via Attributes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[50] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Guiguang Ding,et al. Cross-Modal Image-Text Retrieval with Semantic Consistency , 2019, ACM Multimedia.
[52] Yizhou Yu,et al. Dynamic Graph Attention for Referring Expression Comprehension , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[53] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).