暂无分享,去创建一个
Chen Liang | Yawei Luo | Yu Wu | Yi Yang | Yi Yang | Chen Liang | Yawei Luo | Yu Wu
[1] Arnold W. M. Smeulders,et al. Tracking by Natural Language Specification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Chenliang Xu,et al. Can humans fly? Action understanding with multiple classes of actors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.
[4] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Cheng Deng,et al. Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[7] Kan Chen,et al. Video Object Grounding Using Semantic Roles in Language Description , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Wei Zhang,et al. Segment as Points for Efficient Online Multi-Object Tracking and Segmentation , 2020, ECCV.
[9] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] John D E Gabrieli,et al. Neural correlates of actual and predicted memory formation , 2005, Nature Neuroscience.
[11] Cees Snoek,et al. Actor and Action Video Segmentation from a Sentence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[13] Chenxi Liu,et al. Recurrent Multimodal Interaction for Referring Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[14] Heng Tao Shen,et al. Video Captioning With Attention-Based LSTM and Semantic Consistency , 2017, IEEE Transactions on Multimedia.
[15] Yunchao Wei,et al. Referring Image Segmentation via Cross-Modal Progressive Comprehension , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Cordelia Schmid,et al. Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[17] Yi Yang,et al. Decoupled Novel Object Captioner , 2018, ACM Multimedia.
[18] Yang Zhao,et al. Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Xin Wang,et al. Video Captioning via Hierarchical Reinforcement Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Yunchao Wei,et al. Referring Image Segmentation by Generative Adversarial Learning , 2020, IEEE Transactions on Multimedia.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Jiebo Luo,et al. A Fast and Accurate One-Stage Approach to Visual Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Jiebo Luo,et al. Grounding-Tracking-Integration , 2019, ArXiv.
[25] Leonid Sigal,et al. G3raphGround: Graph-Based Language Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[27] Hwann-Tzong Chen,et al. See-Through-Text Grouping for Referring Image Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Huchuan Lu,et al. Bi-Directional Relationship Inferring Network for Referring Image Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Mubarak Shah,et al. Visual-Textual Capsule Routing for Text-Based Video Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[32] Subhransu Maji,et al. PhraseCut: Language-Based Image Segmentation in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Hao Wang,et al. Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries , 2020, AAAI.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Pablo Arbeláez,et al. Dynamic Multimodal Instance Segmentation guided by natural language queries , 2018, ECCV.
[37] G. T. Buswell. How People Look At Pictures: A Study Of The Psychology Of Perception In Art , 2012 .
[38] Chunhua Shen,et al. Conditional Convolutions for Instance Segmentation , 2020, ECCV.
[39] Xiaojun Chang,et al. Vision-Dialog Navigation by Exploring Cross-Modal Memory , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Ming-Hsuan Yang,et al. Referring Expression Object Segmentation with Caption-Aware Consistency , 2019, BMVC.
[41] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[42] Yan Yan,et al. Dual Attention Matching for Audio-Visual Event Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] J. Wolfe,et al. Five factors that guide attention in visual search , 2017, Nature Human Behaviour.
[44] Yang Wang,et al. Cross-Modal Self-Attention Network for Referring Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Guanbin Li,et al. Linguistic Structure Guided Context Modeling for Referring Image Segmentation , 2020, ECCV.
[46] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.
[47] Trevor Darrell,et al. Segmentation from Natural Language Expressions , 2016, ECCV.
[48] Y. Trope,et al. Person-centered cognition: The presence of people in a visual scene promotes relational reasoning , 2020 .
[49] Qi Tian,et al. Polar Relative Positional Encoding for Video-Language Segmentation , 2020, IJCAI.
[50] Taylor R. Hayes,et al. Meaning-based guidance of attention in scenes as revealed by meaning maps , 2017, Nature Human Behaviour.