论文信息 - Identifying Multi-word Expressions by

Identifying Multi-word Expressions by

Multi-word expressions constitute a significant portion of the lexicon of every natural language, and handling them correctly is mandatory for various NLP applications. Yet such entities are notoriously hard to define, and are consequently missing from standard lexicons and dictionaries. Multi-word expressions exhibit idiosyncratic behavior on various levels: orthographic, morphological, syntactic and semantic. In this work we take advantage of the morphological and syntactic idiosyncrasy of Hebrew noun compounds and employ it to extract such expressions from text corpora. We show that relying on linguistic information dramatically improves the accuracy of compound extraction, reducing over one third of the errors compared with the best baseline.

Shuly Wintner | Hassan Al-Haj | Leveraging Morphological | Syntactic Idiosyncrasy

[1] Afsaneh Fazly,et al. Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[2] Alon Itai,et al. Language resources for Hebrew , 2008, Lang. Resour. Evaluation.

[3] Colin Bannard. A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[4] L. Glinert. The Grammar of Modern Hebrew , 1989 .

[5] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[6] Stefan Evert,et al. Methods for the Qualitative Evaluation of Lexical Association Measures , 2001, ACL.

[7] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.