Determining Compositionality of Expresssions Using Various Word Space Models and Methods

This paper presents a comparative study of 5 different types of Word Space Models (WSMs) combined with 4 different compositionality measures applied to the task of automatically determining semantic compositionality of word expressions. Many combinations of WSMs and measures have never been applied to the task before. The study follows Biemann and Giesbrecht (2011) who attempted to find a list of expressions for which the compositionality assumption – the meaning of an expression is determined by the meaning of its constituents and their combination – does not hold. Our results are very promising and can be appreciated by those interested in WSMs, compositionality, and/or relevant evaluation methods.

[1]  Hung T. Nguyen,et al.  Probability for statistics , 1989 .

[2]  Sivaji Bandyopadhyay,et al.  Shared Task System Description: Measuring the Compositionality of Bigrams using Statistical Methodologies , 2011 .

[3]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[4]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[5]  Aline Villavicencio,et al.  Identification and Treatment of Multiword Expressions Applied to Information Retrieval , 2011, MWE@ACL.

[6]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[7]  Richard Cole,et al.  Concept learning and information inferencing on a high-dimensional semantic space , 2004 .

[8]  John Carroll,et al.  Detecting a Continuum of Compositionality in Phrasal Verbs , 2003, ACL 2003.

[9]  Suresh Manandhar,et al.  An Empirical Study on Compositionality in Compound Nouns , 2011, IJCNLP.

[10]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[11]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Khalid Choukri,et al.  The european language resources association , 1998, LREC.

[15]  Christian Biemann,et al.  Distributional Semantics and Compositionality 2011: Shared Task Description and Results , 2011 .

[16]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[17]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.

[18]  Suresh Manandhar,et al.  Exemplar-Based Word-Space Model for Compositionality Detection: Shared Task System Description , 2011 .

[19]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[20]  P. Kanerva,et al.  Permutations as a means to encode order in word space , 2008 .

[21]  Anders Søgaard,et al.  Shared Task System Description: Frustratingly Hard Compositionality Prediction , 2011 .

[22]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[23]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[24]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[25]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[26]  Karel Jezek,et al.  Determining Compositionality of Word Expressions Using Word Space Models , 2013, MWE@NAACL-HLT.

[27]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[28]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[29]  Karel Jezek,et al.  Detection of Semantic Compositionality Using Semantic Spaces , 2012, TSD.

[30]  Preslav Nakov,et al.  Weight functions impact on LSA performance , 2001 .

[31]  Pavel Pecina Reference Data for Czech Collocation Extraction , 2008 .

[32]  Douglas L. T. Rohde,et al.  An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[33]  Keith Stevens,et al.  The S-Space Package: An Open Source Package for Word Space Models , 2010, ACL.