Object Referring in Visual Scene with Spoken Language
暂无分享,去创建一个
[1] Ankush Gupta,et al. Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[4] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[5] Daniel Marcu,et al. Natural Language Communication with Robots , 2016, NAACL.
[6] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[7] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[8] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[9] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[10] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[11] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[12] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] D. Holdstock. Past, present--and future? , 2005, Medicine, conflict, and survival.
[14] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Trevor Darrell,et al. Segmentation from Natural Language Expressions , 2016, ECCV.
[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[17] J. Jacko,et al. The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .
[18] P R Cohen,et al. The role of voice input for human-machine communication. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[19] Dengxin Dai. Towards Cost-Effective and Performance-Aware Vision Algorithms , 2016 .
[20] Rada Mihalcea,et al. Structured Matching for Phrase Localization , 2016, ECCV.
[21] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[22] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Vibhav Vineet,et al. ImageSpirit: Verbal Guided Image Parsing , 2013, ACM Trans. Graph..
[24] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[26] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[27] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[28] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[30] Srinivas Bangalore,et al. Qme! : A Speech-based Question-Answering system on Mobile Devices , 2010, HLT-NAACL.
[31] Vladimir A. Kulyukin,et al. On natural language dialogue with assistive robots , 2006, HRI '06.
[32] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[34] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[35] Koen E. A. van de Sande,et al. Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.
[36] Marie-Francine Moens,et al. Speech-Based Visual Question Answering , 2017, ArXiv.
[37] Khalil Sima'an,et al. Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2006, Computational Linguistics.
[38] Luke S. Zettlemoyer,et al. Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions , 2014, AAAI.
[39] Ming Liu,et al. AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.
[40] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[41] Stefanie Tellex,et al. Grounding Verbs of Motion in Natural Language Commands to Robots , 2010, ISER.
[42] Stanley Peters,et al. Conversational In-Vehicle Dialog Systems: The past, present, and future , 2016, IEEE Signal Processing Magazine.
[43] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[45] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[46] Svetlana Lazebnik,et al. Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues , 2016, ArXiv.
[47] Gierad Laput,et al. PixelTone: a multimodal interface for image editing , 2013, CHI.
[48] Rohini K. Srihari,et al. Show&Tell: A Semi-Automated Image Annotation System , 2000, IEEE Multim..
[49] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[50] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Kevin Lee,et al. Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..