Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
暂无分享,去创建一个
Vicente Ordonez | Song Feng | Hui Wu | Paola Cascante-Bonilla | Fuwen Tan | Xiaoxiao Guo | Vicente Ordonez | Song Feng | Xiaoxiao Guo | Hui Wu | Fuwen Tan | Paola Cascante-Bonilla
[1] Yu-Gang Jiang,et al. Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.
[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[3] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.
[4] David A. Forsyth,et al. Learning Type-Aware Embeddings for Fashion Compatibility , 2018, ECCV.
[5] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[6] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[7] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[8] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[9] Vicente Ordonez,et al. Chat-crowd: A Dialog-based Platform for Visual Layout Composition , 2018, NAACL.
[10] Adriana Kovashka,et al. Attribute Pivots for Guiding Relevance Feedback in Image Search , 2013, 2013 IEEE International Conference on Computer Vision.
[11] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[12] Wei-Ying Ma,et al. Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Shih-Fu Chang,et al. Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..
[14] Jason Weston,et al. Memory Networks , 2014, ICLR.
[15] Andrew Zisserman,et al. Multiple queries for large scale specific object retrieval , 2012, BMVC.
[16] Rogério Schmidt Feris,et al. Dialog-based Interactive Image Retrieval , 2018, NeurIPS.
[17] Jakob Grue Simonsen,et al. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.
[18] Larry S. Davis,et al. Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.
[19] Ning-San Chang,et al. A Relational Database System for Images , 1980, Pictorial Information Systems.
[20] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[21] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[22] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[23] Yejin Choi,et al. Globally Coherent Text Generation with Neural Checklist Models , 2016, EMNLP.
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Saurabh Singh,et al. Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Gang Hua,et al. Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Adriana Kovashka,et al. WhittleSearch: Interactive Image Search with Relative Attribute Feedback , 2015, International Journal of Computer Vision.
[31] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Pavel Vácha,et al. Query by Pictorial Example , 2011 .
[33] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[34] Tat-Seng Chua,et al. Knowledge-aware Multimodal Dialogue Systems , 2018, ACM Multimedia.
[35] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[37] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.