Recognising Compositionality of Multi-Word Expressions in the Wordnet Oriented Perspective

A method for the recognition of the compositionality of Multi Word Expressions (MWEs) is proposed. First, we study associations between MWEs and the structure of wordnet lexico-semantic relations. A simple method of splitting plWordNet’s MWEs into compositional and non-compositional on the basis of the hypernymy structure is discussed. However, our main goal is to build a classifier for the recognition of compositional MWEs. We assume prior MWE detection. Several experiments with different classification algorithms were performed for the purposes of this task, namely Naive Bayes classifier, Multinomial logistic regression model with a ridge estimator and Decision Table classifier. A heterogeneous set of features is based on: t-score measure for word co-occurrences, Measure of Semantic Relatedness and lexico-syntactic structure of MWEs. MWE compositionality classification is analysed as a knowledge source for automated wordnet expansion.

[1]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[2]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[3]  Veronika Vincze,et al.  Detecting Noun Compounds and Light Verb Constructions: a Contrastive Study , 2011, MWE@ACL.

[4]  Stefan Evert,et al.  Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties , 2006 .

[5]  Maciej Piasecki,et al.  Approaching plWordNet 2.0 , 2012 .

[6]  Ioannis Korkontzelos,et al.  Detecting Compositionality in Multi-Word Expressions , 2009, ACL/IJCNLP.

[7]  Martin Emms,et al.  Measuring the Compositionality of Collocations via Word Co-occurrence Vectors: Shared Task System Description , 2011 .

[8]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[9]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[10]  Maciej Piasecki,et al.  A Wordnet from the ground up , 2009 .

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Iñaki Alegria,et al.  Measuring the compositionality of NV expressions in Basque by means of distributional similarity techniques , 2012, LREC.

[13]  Paul Rayson,et al.  Measuring MWE Compositionality Using Semantic Annotation , 2006 .

[14]  Karel Jezek,et al.  Determining Compositionality of Word Expressions Using Word Space Models , 2013, MWE@NAACL-HLT.

[15]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Maciej Piasecki,et al.  Recognition of Structured Collocations in An Inflective Language , 2008 .

[18]  John Carroll,et al.  Detecting a Continuum of Compositionality in Phrasal Verbs , 2003, ACL 2003.

[19]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[20]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[21]  Maciej Piasecki,et al.  SuperMatrix: a General tool for lexical semantic knowledge acquisition , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[22]  Izabella Thomas Maciej PIASECKI, Stanis?aw SZPAKOWICZ, Bartosz BRODA, « A Wordnet from the Ground Up », Oficyna Wydawnicza Politechniki Wroc?awskiej , 2010 .

[23]  J Fermanian,et al.  Assessing the reliability of clinical scales when the data have both nominal and ordinal features: proposed guidelines for neuropsychological assessments. , 1992, Journal of clinical and experimental neuropsychology.

[24]  Bahar Salehi,et al.  Predicting the Compositionality of Multiword Expressions Using Translations in Multiple Languages , 2013, *SEMEVAL.

[25]  Ted Pedersen Identifying Collocations to Measure Compositionality: Shared Task System Description , 2011 .

[26]  Peter Pagin,et al.  Is Compositionality Compatible with Holism , 1997 .

[27]  Maciej Piasecki,et al.  Constraint Based Description of Polish Multiword Expressions , 2012, LREC.

[28]  Maria Helena Svensson,et al.  5. A very complex criterion of fixedness: Non-compositionality , 2008 .

[29]  Ioannis Korkontzelos,et al.  Graph Connectivity Measures for Unsupervised Parameter Tuning of Graph-Based Sense Induction Systems. , 2009 .

[30]  Maciej Piasecki,et al.  Tools for plWordNet Development. Presentation and Perspectives , 2012, LREC.

[31]  Chris Biemann,et al.  Proceedings of the Workshop on Distributional Semantics and Compositionality , 2011 .

[32]  Carlos Ramisch,et al.  Proceedings of the 9th Workshop on Multiword Expressions, MWE@NAACL-HLT 2013, 13-14 June 2013, Atlanta, Georgia, USA , 2013, MWE@NAACL-HLT.

[33]  H. Toutenburg Fleiss, J. L.: Statistical Methods for Rates and Proportions. John Wiley & Sons, New York‐London‐Sydney‐Toronto 1973. XIII, 233 S. , 1974 .

[34]  Aravind K. Joshi,et al.  Measuring the Relative Compositionality of Verb-Noun (V-N) Collocations by Integrating Features , 2005, HLT.