论文信息 - Detecting Multi-Word Expressions Improves Word Sense Disambiguation

Detecting Multi-Word Expressions Improves Word Sense Disambiguation

Multi-Word Expressions (MWEs) are prevalent in text and are also, on average, less polysemous than mono-words. This suggests that accurate MWE detection should lead to a non-trivial improvement in Word Sense Disambiguation (WSD). We show that a straightforward MWE detection strategy, due to Arranz et al. (2005), can increase a WSD algorithm's baseline f-measure by 5 percentage points. Our measurements are consistent with Arranz's, and our study goes further by using a portion of the Semcor corpus containing 12,449 MWEs - over 30 times more than the approximately 400 used by Arranz. We also show that perfect MWE detection over Semcor only nets a total 6 percentage point increase in WSD f-measure; therefore there is little room for improvement over the results presented here. We provide our MWE detection algorithms, along with a general detection framework, in a free, open-source Java library called jMWE.

Mark A. Finlayson | Nidhi Kulkarni | Nidhi Kulkarni

[1] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[2] Victoria Arranz,et al. Multiwords and Word Sense Disambiguation , 2005, CICLing.

[3] Dan I. Moldovan,et al. Word sense disambiguation of WordNet glosses , 2004, Comput. Speech Lang..

[4] Mark A. Finlayson,et al. Source code and data for MWE'2011 papers , 2011 .

[5] German Rigau,et al. The TALP systems for disambiguating WordNet glosses , 2004, SENSEVAL@ACL.

[6] Rada Mihalcea,et al. Word Sense Disambiguation , 2015, Encyclopedia of Machine Learning.

[7] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications , 2007 .