Exploring Multi-Modal Text+Image Models to Distinguish between Abstract and Concrete Nouns

This paper explores variants of multi-modal computational models that aim to distinguish between abstract and concrete nouns. We assumed that textual vs. visual modalities might have different strengths in providing information on abstract vs. concrete words. While the overall predictions of our models were highly successful (reaching an accuracy of 96.45% in a binary classification and a Spearman correlation of 0.86 in a regression analysis), the differences between the textual, visual and combined modalities were however negligible, hence both text and images seem to provide reliable, non-complementary information to represent both abstract and concrete words.

[1]  L. Barsalou,et al.  Whither structured representation? , 1999, Behavioral and Brain Sciences.

[2]  Michael P. Kaschak,et al.  Grounding language in action , 2002, Psychonomic bulletin & review.

[3]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[4]  L. Barsalou,et al.  Situating Abstract Concepts , 2004 .

[5]  Larry Shapiro The Embodied Cognition Research Programme , 2007 .

[6]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[7]  David P. Vinson,et al.  Inferring a probabilistic model of semantic memory from word association norms , 2008 .

[8]  Diane Pecher,et al.  Abstract concepts: sensory-motor grounding, metaphors, and beyond , 2011 .

[9]  Roland Schäfer,et al.  Building Large Corpora from the Web Using a New Efficient Tool Chain , 2012, LREC.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Carina Silberer,et al.  Grounded Models of Semantic Representation , 2012, EMNLP.

[12]  Sabine Schulte im Walde,et al.  A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Amy Beth Warriner,et al.  Norms of valence, arousal, and dominance for 13,915 English lemmas , 2013, Behavior Research Methods.

[15]  Stephen Clark,et al.  Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More , 2014, ACL.

[16]  Felix Hill,et al.  Multi-Modal Models for Concrete and Abstract Concept Meaning , 2014, TACL.

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[19]  Amy Beth Warriner,et al.  Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.

[20]  Roland Schäfer,et al.  Processing and querying large web corpora with the COW14 architecture , 2015 .

[21]  Angeliki Lazaridou,et al.  Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.

[22]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Stephen Clark,et al.  Comparing Data Sources and Architectures for Deep Visual Representation Learning in Semantics , 2016, EMNLP.

[24]  Sabine Schulte im Walde,et al.  Complex Verbs are Different: Exploring the Visual Modality in Multi-Modal Models to Predict Compositionality , 2017, MWE@EACL.

[25]  Diego Frassinelli,et al.  Contextual Characteristics of Concrete and Abstract Words , 2017, IWCS.