Revisiting Image-Language Networks for Open-ended Phrase Detection.
暂无分享,去创建一个
Kate Saenko | Svetlana Lazebnik | Stan Sclaroff | Bryan A. Plummer | Kevin J. Shih | Yichen Li | Ke Xu
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[3] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[4] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[5] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.
[6] Paul Clough,et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .
[7] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[8] Ross B. Girshick,et al. LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Svetlana Lazebnik,et al. Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[10] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Gregory Shakhnarovich,et al. Comprehension-Guided Referring Expressions , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Jiebo Luo,et al. A Fast and Accurate One-Stage Approach to Visual Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] P. J. Huber. Robust Estimation of a Location Parameter , 1964 .
[15] Cordelia Schmid,et al. Weakly-Supervised Alignment of Video with Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Lior Wolf,et al. Associating neural word embeddings with deep image representations using Fisher Vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[18] Liang Wang,et al. Referring Expression Generation and Comprehension via Attributes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[19] Jeff A. Bilmes,et al. Deep Canonical Correlation Analysis , 2013, ICML.
[20] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[21] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[22] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[23] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[24] Shin'ichi Satoh,et al. Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation , 2017, EMNLP.
[25] Markus H. Gross,et al. Neural Sequential Phrase Grounding (SeqGROUND) , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[29] Robinson Piramuthu,et al. Conditional Image-Text Embedding Networks , 2017, ECCV.
[30] Ross B. Girshick,et al. Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Jiyang Gao,et al. MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding , 2017, ICMR.
[32] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[33] Yuting Zhang,et al. Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[35] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Chenxi Liu,et al. Attention Correctness in Neural Image Captioning , 2016, AAAI.
[37] Jinjun Xiong,et al. Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts , 2018, NIPS.
[38] Leonid Sigal,et al. G3raphGround: Graph-Based Language Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[40] Rada Mihalcea,et al. Structured Matching for Phrase Localization , 2016, ECCV.
[41] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] Liwei Wang,et al. Learning Two-Branch Neural Networks for Image-Text Matching Tasks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[43] Rodrigo C. Barros,et al. Bidirectional Retrieval Made Simple , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] H. Hotelling. Relations Between Two Sets of Variates , 1936 .
[45] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Kate Saenko,et al. Language Features Matter: Effective Language Representations for Vision-Language Tasks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[47] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Fang Zhao,et al. Deep Attribute-preserving Metric Learning for Natural Language Object Retrieval , 2017, ACM Multimedia.
[50] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[52] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[53] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[54] Christopher Kanan,et al. VQD: Visual Query Detection In Natural Scenes , 2019, NAACL.
[55] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.