Evaluation of Simple Distributional Compositional Operations on Longer Texts

Distributional semantic models have been effective at representing linguistic semantics at the word level, and more recently research has moved on to the construction of distributional representations for larger segments of text. However, it is not well understood how the composition operators that work well on short phrase-based models scale up to full-length sentences. In this paper we test several simple compositional methods on a sentence-length similarity task and discover that their performance peaks at fewer than ten operations. We also introduce a novel sentence segmentation method that reduces the number of compositional operations.

[1]  Michael N Jones,et al.  Representing word meaning and order information in a composite holographic lexicon. , 2007, Psychological review.

[2]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[3]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[4]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[5]  S. Russel and P. Norvig,et al.  “Artificial Intelligence – A Modern Approach”, Second Edition, Pearson Education, 2003. , 2015 .

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[8]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[9]  Tony Plate,et al.  Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations , 1991, IJCAI.

[10]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[11]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[12]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[13]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[14]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.