Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More

Models that learn semantic representations from both linguistic and perceptual input outperform text-only models in many contexts and better reflect human concept acquisition. However, experiments suggest that while the inclusion of perceptual input improves representations of certain concepts, it degrades the representations of others. We propose an unsupervised method to determine whether to include perceptual input for a concept, and show that it significantly improves the ability of multi-modal models to learn and represent word meanings. The method relies solely on image data, and can be applied to a variety of other NLP tasks.

[1]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[2]  Max Coltheart,et al.  The MRC Psycholinguistic Database , 1981 .

[3]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[4]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[5]  Christine D. Wilson,et al.  Grounding conceptual knowledge in modality-specific systems , 2003, Trends in Cognitive Sciences.

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[8]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[9]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Oi Yee Kwong,et al.  A Preliminary Study on the Impact of Lexical Concreteness on Word Sense Disambiguation , 2008, PACLIC.

[11]  Gabriella Vigliocco,et al.  Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.

[12]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[13]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yansong Feng,et al.  Visual Information in Semantic Representation , 2010, NAACL.

[15]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[16]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[17]  Randy Goebel,et al.  Using Visual Information to Predict Lexical Preference , 2011, RANLP.

[18]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[19]  Yair Neuman,et al.  Literal and Metaphorical Sense Identification through Concrete and Abstract Context , 2011, EMNLP.

[20]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[21]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[22]  Carina Silberer,et al.  Grounded Models of Semantic Representation , 2012, EMNLP.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Sabine Schulte im Walde,et al.  A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Felix Hill,et al.  Concreteness and Corpora: A Theoretical and Practical Analysis , 2013 .

[27]  Christian Bentz,et al.  A Quantitative Empirical Analysis of the Abstract/Concrete Distinction , 2014, Cogn. Sci..

[28]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[29]  Benjamin Naumann,et al.  Mental Representations A Dual Coding Approach , 2016 .