论文信息 - Visually Grounded Meaning Representations

Visually Grounded Meaning Representations

In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level representations from textual and visual input. The visual modality is encoded via vectors of attributes obtained automatically from images. We create a new large-scale taxonomy of 600 visual attributes representing more than 500 concepts and 700 K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We evaluate our model on its ability to simulate word similarity judgments and concept categorization. On both tasks, our model yields a better fit to behavioral data compared to baselines and related models which either rely on a single modality or do not make use of attribute-based input.

[1] J. Gabrieli,et al. Effects of Semantic and Associative Relatedness on Automatic Priming , 1998 .

[2] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[3] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[5] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[6] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[7] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.

[8] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[10] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[11] Yansong Feng,et al. Visual Information in Semantic Representation , 2010, NAACL.

[12] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[13] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[14] L. Barsalou. Grounded cognition. , 2008, Annual review of psychology.

[15] Mirella Lapata,et al. Meaning Representation in Natural Language Categorization , 2010 .

[16] S C McKinley,et al. Investigations of exemplar and decision bound models in large, ill-defined category structures. , 1995, Journal of experimental psychology. Human perception and performance.

[17] Anna Korhonen,et al. Acquiring Human-like Feature-Based Conceptual Representations from Corpora , 2010, HLT-NAACL 2010.

[18] John R. Anderson,et al. The Adaptive Nature of Human Categorization , 1991 .

[19] Robert L. Goldstone,et al. Concepts and Categorization , 2003 .

[20] Mark Steyvers,et al. Topics in semantic representation. , 2007, Psychological review.

[21] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[23] Jeff A. Bilmes,et al. Deep Canonical Correlation Analysis , 2013, ICML.

[24] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[25] Thomas L. Griffiths,et al. Identifying representations of categories of discrete items using Markov chain Monte Carlo with People , 2012, CogSci.

[26] Léon Bottou,et al. Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics , 2014, EMNLP.

[27] Kun Duan,et al. Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Elia Bruni,et al. VSEM: An open library for visual semantics representation , 2013, ACL.

[29] Robert L. Goldstone,et al. 22 Concepts and Categorization , 2012 .

[30] Mirella Lapata,et al. Incremental Models of Natural Language Category Acquisition , 2011, CogSci.

[31] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[32] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[33] Ming-Wei Chang,et al. Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[34] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[35] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[36] Thomas L. Griffiths,et al. A more rational model of categorization , 2006 .

[37] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[38] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Chunyan Miao,et al. Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.

[40] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[41] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.

[42] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[43] Elia Bruni,et al. Distributional semantics from text and images , 2011, GEMS.

[44] Marc'Aurelio Ranzato,et al. Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[45] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[46] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[47] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[48] Gert Westermann,et al. From perceptual to language-mediated categorization , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[49] Jon A. Willits,et al. Models of Semantic Memory , 2015 .

[50] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[51] Thomas A. Schreiber,et al. The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[52] Daniel,et al. Default Probability , 2004 .

[53] Felix Hill,et al. Learning Abstract Concept Embeddings from Multi-Modal Data: Since You Probably Can’t See What I Mean , 2014, EMNLP.

[54] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[55] Honglak Lee,et al. Improved Multimodal Deep Learning with Variation of Information , 2014, NIPS.

[56] Chris McNorgan,et al. An attractor model of lexical conceptual processing: simulating semantic priming , 1999, Cogn. Sci..

[57] Shree K. Nayar,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence Describable Visual Attributes for Face Verification and Image Search , 2022 .

[58] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[59] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[60] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[61] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[62] Christian Biemann,et al. Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[63] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[64] Carina Silberer,et al. Models of Semantic Representation with Visual Attributes , 2013, ACL.

[65] Fernando Gomez,et al. A New Set of Norms for Semantic Relatedness Measures , 2013, ACL.

[66] Gabriella Vigliocco,et al. Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.

[67] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[68] James Hays,et al. SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[70] Benjamin Rey,et al. Generating query substitutions , 2006, WWW '06.

[71] Massimo Poesio,et al. Strudel: A Corpus-Based Semantic Model Based on Properties and Types , 2010, Cogn. Sci..

[72] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[73] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[74] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[75] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[76] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[77] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[78] Michael P. Kaschak,et al. Grounding language in action , 2002, Psychonomic bulletin & review.

[79] David P Vinson,et al. Semantic feature production norms for a large set of objects and events , 2008, Behavior research methods.

[80] Carina Silberer,et al. Grounded Models of Semantic Representation , 2012, EMNLP.

[81] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[82] Mark S. Seidenberg,et al. Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[83] Ted Pedersen,et al. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[84] Thierry Poibeau,et al. Towards Unrestricted, Large-Scale Acquisition of Feature-Based Conceptual Representations from Corpus Data , 2009 .

[85] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[86] Sabine Schulte im Walde,et al. A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.

[87] Eneko Agirre,et al. Semeval-2007 Task 2 : Evaluating Word Sense Induction and Discrimination , 2007 .

[88] Andrew Zisserman,et al. Learning Visual Attributes , 2007, NIPS.

[89] Geoffrey E. Hinton,et al. Lesioning an attractor network: investigations of acquired dyslexia. , 1991, Psychological review.

[90] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[91] Zellig S. Harris,et al. Distributional Structure , 1954 .

[92] James L. McClelland,et al. Structure and deterioration of semantic memory: a neuropsychological and computational investigation. , 2004, Psychological review.

[93] James L. McClelland,et al. A computational model of semantic memory impairment: modality specificity and emergent category specificity. , 1991, Journal of experimental psychology. General.

[94] Michael N. Jones,et al. Perceptual Inference Through Global Lexical Similarity , 2012, Top. Cogn. Sci..

[95] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[96] Angeliki Lazaridou,et al. Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.

[97] Mirella Lapata,et al. Incremental Bayesian Learning of Semantic Categories , 2014, EACL.

[98] Linda B. Smith,et al. Object perception and object naming in early development , 1998, Trends in Cognitive Sciences.