If Sentences Could See: Investigating Visual Information for Semantic Textual Similarity

We investigate the effects of incorporating visual signal from images into unsupervised Semantic Textual Similarity (STS) measures. STS measures exploiting visual signal alone are shown to outperform, in some settings, linguistic-only measures by a wide margin, whereas multi-modal measures yield further performance gains. We also show that selective inclusion of visual information may further boost performance in the multi-modal setup.

[1]  Sabine Schulte im Walde,et al.  A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.

[2]  Parth Gupta,et al.  Cross-Language Plagiarism Detection Using a Multilingual Semantic Network , 2013, ECIR.

[3]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[4]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[5]  Lucia Specia,et al.  Fully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora , 2011, STIL.

[6]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[7]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[8]  Douwe Kiela MMFeat: A Toolkit for Extracting Multi-Modal Features , 2016, ACL.

[9]  Khalil Sima'an,et al.  A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Stevan Harnad The Symbol Grounding Problem , 1999, ArXiv.

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Tomaz Erjavec,et al.  hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene , 2011, TSD.

[15]  S. Vereza Philosophy in the flesh: the embodied mind and its challenge to Western thought , 2001 .

[16]  Jean Maillard,et al.  Black Holes and White Rabbits: Metaphor Identification with Visual Features , 2016, NAACL.

[17]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment , 2014, *SEMEVAL.

[18]  Max M. Louwerse,et al.  Symbol Interdependency in Symbolic and Embodied Cognition , 2011, Top. Cogn. Sci..

[19]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[20]  Benjamin Van Durme,et al.  Learning Bilingual Lexicons Using the Visual Similarity of Labeled Web Images , 2011, IJCAI.

[21]  Stephen Clark,et al.  Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More , 2014, ACL.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Sergio Jimenez SERGIOJIMENEZ at SemEval-2016 Task 1: Effectively Combining Paraphrase Database, String Matching, WordNet, and Word Embedding for Semantic Textual Similarity , 2016, SemEval@NAACL-HLT.

[26]  Felix Hill,et al.  Concreteness and Corpora: A Theoretical and Practical Analysis , 2013 .

[27]  Léon Bottou,et al.  Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics , 2014, EMNLP.

[28]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[29]  Marie-Francine Moens,et al.  Multi-Modal Representations for Improved Bilingual Lexicon Learning , 2016, ACL.

[30]  Tomas Brychcin,et al.  UWB at SemEval-2016 Task 1: Semantic Textual Similarity using Lexical, Syntactic, and Semantic Information , 2016, *SEMEVAL.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[34]  Stephen Clark,et al.  Comparing Data Sources and Architectures for Deep Visual Representation Learning in Semantics , 2016, EMNLP.

[35]  Stephen Clark,et al.  Visual Bilingual Lexicon Induction with Transferred ConvNet Features , 2015, EMNLP.

[36]  Benno Stein,et al.  Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[37]  Randy Goebel,et al.  Using Visual Information to Predict Lexical Preference , 2011, RANLP.

[38]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[39]  Carina Silberer,et al.  Grounded Models of Semantic Representation , 2012, EMNLP.

[40]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[41]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[42]  Stephen Clark,et al.  Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.

[43]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.