VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes
暂无分享,去创建一个
José M. F. Moura | Devi Parikh | Satwik Kottur | Ramakrishna Vedantam | Devi Parikh | Ramakrishna Vedantam | Satwik Kottur
[1] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[2] C. Lawrence Zitnick,et al. Adopting Abstract Images for Semantic Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Richard S. Zemel,et al. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset , 2015, ArXiv.
[4] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Devi Parikh,et al. Image specificity , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[8] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[10] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.
[11] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Xiao Lin,et al. Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Eric P. Xing,et al. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.
[14] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[15] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[16] Subhransu Maji,et al. Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.
[17] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Carina Silberer,et al. Learning Grounded Meaning Representations with Autoencoders , 2014, ACL.
[19] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[20] C. Lawrence Zitnick,et al. Learning Common Sense through Visual Abstraction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[21] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[22] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[23] Xufeng Han,et al. Midge: Generating Descriptions of Images , 2012, INLG.
[24] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..
[25] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[26] David F. Fouhey,et al. Predicting Object Dynamics in Scenes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[28] Lukás Burget,et al. Neural network based language models for highly inflective languages , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[31] C. Lawrence Zitnick,et al. Zero-Shot Learning via Visual Abstraction , 2014, ECCV.
[32] Tamara L. Berg,et al. Baby Talk : Understanding and Generating Image Descriptions , 2011 .
[33] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[34] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[35] Carina Silberer,et al. Models of Semantic Representation with Visual Attributes , 2013, ACL.
[36] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[37] Ali Farhadi,et al. VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Donald Geman,et al. Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.
[39] Xinlei Chen,et al. Learning a Recurrent Visual Representation for Image Caption Generation , 2014, ArXiv.
[40] Bernt Schiele,et al. Translating Video Content to Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[41] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.
[42] Angeliki Lazaridou,et al. Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.
[43] Ran Xu,et al. Improving Word Representations via Global Visual Context , 2014 .
[44] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL 2006.
[45] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.
[47] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[48] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.