Object Referring in Videos with Language and Human Gaze
暂无分享,去创建一个
[1] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[2] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Marek Kowalski,et al. Deep Alignment Network: A Convolutional Neural Network for Robust Face Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[4] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[5] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Luc Van Gool,et al. Object Referring in Visual Scene with Spoken Language , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[8] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[9] Trevor Darrell,et al. Segmentation from Natural Language Expressions , 2016, ECCV.
[10] Michael L. Geis,et al. Syntax and Semantics. Volume 3 : Speech Acts , 1976 .
[11] B. S. Manjunath,et al. From Where and How to What We See , 2013, 2013 IEEE International Conference on Computer Vision.
[12] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[13] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[15] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[16] Pia Knoeferle,et al. Can Speaker Gaze Modulate Syntactic Structuring and Thematic Role Assignment during Spoken Sentence Comprehension? , 2012, Front. Psychology.
[17] Luca Bertinetto,et al. End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[19] B. S. Manjunath,et al. Eye tracking assisted extraction of attentionally important objects from videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Yusuke Sugano,et al. Seeing with Humans: Gaze-Assisted Neural Image Captioning , 2016, ArXiv.
[21] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[22] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[23] Ramakant Nevatia,et al. Query-Guided Regression Network with Context Policy for Phrase Grounding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[25] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[26] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[27] Frank Keller,et al. Training Object Class Detectors from Eye Tracking Data , 2014, ECCV.
[28] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[30] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[31] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[32] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[36] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[37] Pia Knoeferle,et al. Referential Gaze Makes a Difference in Spoken Language Comprehension: Human Speaker vs. Virtual Agent Listener Gaze , 2017, AVSP.
[38] Jinjun Xiong,et al. Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts , 2018, NIPS.
[39] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[40] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[42] Pia Knoeferle,et al. Peripheral speaker gaze facilitates spoken language comprehension: syntactic structuring and thematic role asssignment in German , 2011 .
[43] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[46] Svetlana Lazebnik,et al. Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues , 2016, ArXiv.
[47] Antonio Torralba,et al. Where are they looking? , 2015, NIPS.
[48] Rada Mihalcea,et al. Structured Matching for Phrase Localization , 2016, ECCV.
[49] Rakesh Gupta,et al. Situated multi-modal dialog system in vehicles , 2013, GazeIn '13.
[50] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.
[51] Yifan Peng,et al. Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[52] Jorma Laaksonen,et al. Towards gaze-based video annotation , 2016, 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA).
[53] Yongdong Zhang,et al. Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[55] Wojciech Matusik,et al. Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Luc Van Gool,et al. Fast Optical Flow Using Dense Inverse Search , 2016, ECCV.
[57] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[58] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).