Algebraic Compositional Models for Semantic Similarity in Ranking and Clustering

Although distributional models of word meaning have been widely used in Information Retrieval achieving an effective representation and generalization schema of words in isolation, the composition of words in phrases or sentences is still a challenging task. Different methods have been proposed to account on syntactic structures to combine words in term of algebraic operators (e.g. tensor product) among vectors that represent lexical constituents. In this paper, a novel approach for semantic composition based on space projection techniques over the basic geometric lexical representations is proposed. In the geometric perspective here pursued, syntactic bi-grams are projected in the so called Support Subspace, aimed at emphasizing the semantic features shared by the compound words and better capturing phrase-specific aspects of the involved lexical meanings. State-of-the-art results are achieved in a well known benchmark for phrase similarity task and the generalization capability of the proposed operators is investigated in a cross-linguistic scenario, i.e. in the English and Italian Language.

[1]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[2]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[3]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[4]  G. Frege Über Sinn und Bedeutung , 1892 .

[5]  Roberto Basili,et al.  Automatic induction of FrameNet lexical units , 2008, EMNLP.

[6]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[7]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[8]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[9]  Ricardo Baeza-Yates,et al.  Towards Semantic Search , 2008, NLDB.

[10]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[11]  Max J. Cresswell,et al.  Formal philosophy, selected papers of richard montague , 1976 .

[12]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[13]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[16]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[17]  Roberto Basili,et al.  Towards Open-Domain Semantic Role Labeling , 2010, ACL.

[18]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[19]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[20]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[21]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[22]  Patrick Pantel,et al.  Document clustering with committees , 2002, SIGIR '02.

[23]  Roberto Basili,et al.  Space Projections as Distributional Models for Semantic Composition , 2012, CICLing.