UNIBA: Distributional Semantics for Textual Similarity

We report the results of UNIBA participation in the first SemEval-2012 Semantic Textual Similarity task. Our systems rely on distributional models of words automatically inferred from a large corpus. We exploit three different semantic word spaces: Random Indexing (RI), Latent Semantic Analysis (LSA) over RI, and vector permutations in RI. Runs based on these spaces consistently outperform the baseline on the proposed datasets.

[1]  Arne Jönsson,et al.  Using Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis , 2008, LREC.

[2]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[3]  Stephen Clark,et al.  Combining Symbolic and Distributional Models of Meaning , 2007, AAAI Spring Symposium: Quantum Interaction.

[4]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[5]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[6]  Trevor Cohen,et al.  Logical Leaps and Quantum Connectives: Forging Paths through Predication Space , 2010, AAAI Fall Symposium: Quantum Informatics for Cognitive, Social, and Semantic Processes.

[7]  Annalina Caputo,et al.  Encoding syntactic dependencies by vector permutation , 2011, GEMS.

[8]  Michael N Jones,et al.  Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[9]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[10]  Dominic Widdows,et al.  Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application , 2008, LREC.

[11]  Daoud Clarke,et al.  A Context-Theoretic Framework for Compositionality in Distributional Semantics , 2011, Computational Linguistics.

[12]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[13]  Jan O. Pedersen Information Retrieval Based on Word Senses , 1995 .

[14]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[15]  P. Kanerva,et al.  Permutations as a means to encode order in word space , 2008 .

[16]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[17]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[18]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[19]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[20]  S. Clark,et al.  A Compositional Distributional Model of Meaning , 2008 .