论文信息 - Combining resources for MWE-token classification

Combining resources for MWE-token classification

We study the task of automatically disambiguating word combinations such as jump the gun which are ambiguous between a literal and MWE interpretation, focusing on the utility of type-level features from an MWE lexicon for the disambiguation task. To this end we combine gold-standard idiomaticity of tokens in the OpenMWE corpus with MWE-type-level information drawn from the recently-published JDMWE lexicon. We find that constituent modifiability in an MWE-type is more predictive of the idiomaticity of its tokens than other constituent characteristics such as semantic class or part of speech.

Timothy Baldwin | Richard Fothergill

[1] Caroline Sporleder,et al. Linguistic Cues for Distinguishing Literal and Non-Literal Usages , 2010, COLING.

[2] Afsaneh Fazly,et al. Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[3] Satoshi Sato,et al. Detecting Japanese idioms with a linguistically rich dictionary , 2006, Lang. Resour. Evaluation.

[4] Daisuke Kawahara,et al. Compilation of an idiom example database for supervised idiom identification , 2009, Lang. Resour. Evaluation.

[5] Timothy Baldwin,et al. Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[6] Mona Diab,et al. Verb noun construction MWE token supervised classification , 2009 .

[7] Timothy Baldwin,et al. Multiword Expressions , 2010, Handbook of Natural Language Processing.

[8] Timothy Baldwin,et al. Fleshing it out: A Supervised Approach to MWE-token and MWE-type Classification , 2011, IJCNLP.

[9] Kosho Shudo,et al. A Comprehensive Dictionary of Multiword Expressions , 2011, ACL.