Is this a Child, a Girl or a Car? Exploring the Contribution of Distributional Similarity to Learning Referential Word Meanings

There has recently been a lot of work trying to use images of referents of words for improving vector space meaning representations derived from text. We investigate the opposite direction, as it were, trying to improve visual word predictors that identify objects in images, by exploiting distributional similarity information during training. We show that for certain words (such as entry-level nouns or hypernyms), we can indeed learn better referential word meanings by taking into account their semantic similarity to other words. For other words, there is no or even a detrimental effect, compared to a learning setup that presents even semantically related objects as negative instances.

[1]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[2]  Deb Roy,et al.  A trainable spoken language understanding system for visual object selection , 2002, INTERSPEECH.

[3]  Katrin Erk,et al.  What do you know about an alligator when you know the company it keeps , 2016 .

[4]  Wei Liu,et al.  Learning to name objects , 2016, Commun. ACM.

[5]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[6]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  David Schlangen,et al.  Resolving References to Objects in Photographs using the Words-As-Classifiers Model , 2015, ACL.

[9]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[10]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[13]  Roy Schwartz,et al.  How Well Do Distributional Models Capture Different Types of Semantic Knowledge? , 2015, ACL.

[14]  David Schlangen,et al.  Simple Learning and Compositional Application of Perceptually Grounded Word Meanings for Incremental Reference Resolution , 2015, ACL.

[15]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[16]  Paul Clough,et al.  The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .

[17]  David Schlangen,et al.  Towards Generating Colour Terms for Referents in Photographs: Prefer the Expected or the Unexpected? , 2016, INLG.

[18]  Licheng Yu,et al.  Modeling Context in Referring Expressions , 2016, ECCV.

[19]  Vicente Ordonez,et al.  ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Angeliki Lazaridou,et al.  Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world , 2014, ACL.

[22]  David Schlangen,et al.  Easy Things First: Installments Improve Referring Expression Generation for Objects in Photographs , 2016, ACL.