Modelling Visual Properties and Visual Context in Multimodal Semantics

Multimodal semantic models that extend linguistic representations with additional perceptual input have proved successful in a range of natural language processing (NLP) tasks. However, existing research has extracted visual features from complete images, and has not examined how different kinds of visual information impact performance. We construct multimodal models that differentiate between internal visual properties of the objects and their external visual context. We evaluate the models on the task of decoding brain activity associated with the meanings of nouns, demonstrating their advantage over those based on complete images.

[1]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[2]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[3]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Anna Korhonen,et al.  Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora , 2010, HLT-NAACL 2010.

[5]  Max M. Louwerse,et al.  Symbol Interdependency in Symbolic and Embodied Cognition , 2011, Top. Cogn. Sci..

[6]  Tom M. Mitchell,et al.  Selecting Corpus-Semantic Models for Neurolinguistic Decoding , 2012, *SEMEVAL.

[7]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[8]  Nicu Sebe,et al.  Distributional semantics with eyes: using image analysis to improve computational representations of word meaning , 2012, ACM Multimedia.

[9]  Jack L. Gallant,et al.  A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain , 2012, Neuron.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Francisco Pereira,et al.  Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments , 2013, Artif. Intell..

[12]  Massimo Poesio,et al.  Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts , 2013, EMNLP.

[13]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Carina Silberer,et al.  Learning Grounded Meaning Representations with Autoencoders , 2014, ACL.

[16]  Léon Bottou,et al.  Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics , 2014, EMNLP.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Stephen Clark,et al.  Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Stephen Clark,et al.  RELPRON: A Relative Clause Evaluation Data Set for Compositional Distributional Semantics , 2016, CL.

[21]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[22]  Douwe Kiela MMFeat: A Toolkit for Extracting Multi-Modal Features , 2016, ACL.

[23]  Benjamin D. Zinszer,et al.  Representational similarity encoding for fMRI: Pattern-based synthesis to predict brain activity using stimulus-model-similarities , 2016, NeuroImage.

[24]  Jean Maillard,et al.  Black Holes and White Rabbits: Metaphor Identification with Visual Features , 2016, NAACL.

[25]  Stephen Clark,et al.  Speaking, Seeing, Understanding: Correlating semantic models with conceptual representation in the brain , 2017, EMNLP.