Towards a Variability Measure for Multiword Expressions

One of the outstanding properties of multi-word expressions (MWEs), especially verbal ones (VMWEs), important both in theoretical models and applications, is their idiosyncratic variability. Some MWEs are always continuous , while some others admit certain types of insertions. Components of some MWEs are rarely or never modified, while some others admit either specific or unrestricted modification. This unpredictable variability profile of MWEs hinders modeling and processing them as " words-with-spaces " on the one hand, and as regular syntactic structures on the other hand. Since variability of MWEs is a matter of scale rather than a binary property, we propose a 2-dimensional language-independent measure of variability dedicated to verbal MWEs based on syntactic and discontinuity-related clues. We assess its relevance with respect to a linguistic benchmark and its utility for the tasks of VMWE classification and variant identification on a French corpus.

[1]  Jan Snajder,et al.  Combining Linguistic Features for the Detection of Croatian Multiword Expressions , 2017, MWE@EACL.

[2]  Carlos Ramisch,et al.  Survey: Multiword Expression Processing: A Survey , 2017, CL.

[3]  Ulrich Heid,et al.  Extraction of German Multiword Expressions from Parsed Corpora Using Context Features , 2010, LREC.

[4]  Timothy Baldwin,et al.  Road-testing the English Resource Grammar Over the British National Corpus , 2004, LREC.

[5]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[6]  Agata Savary,et al.  Literal readings of multiword expressions: as scarce as hen’s teeth , 2018, TLT.

[7]  Agnès Tutin,et al.  Comparing morphological and syntactic variations of support verb constructions and verbal full phrasemes in French: a corpus based study , 2016 .

[8]  Behrang Q. Zadeh,et al.  The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions , 2017, MWE@EACL.

[9]  Malvina Nissim,et al.  Modeling the internal variability of multiword expressions through a pattern-based method , 2013, TSLP.

[10]  I. Sag,et al.  Idioms , 2015 .

[11]  Christian Jacquemin,et al.  Spotting and Discovering Terms through Natural Language Processing , 1997 .

[12]  Yulia Tsvetkov,et al.  Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources , 2014, Computational Linguistics.

[13]  Carlos Ramisch,et al.  An Evaluation of Methods for the Extraction of Multiword Expressions , 2008, LREC 2008.

[14]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .

[15]  G. Gross Degré de figement des noms composés , 1988 .

[16]  Meghdad Farahmand,et al.  Modeling the Non-Substitutability of Multiword Expressions with Distributional Semantics and a Log-Linear Model , 2016, MWE@ACL.

[17]  Maurice Gross,et al.  Une classification des phrases « figées » du français , 1982 .