论文信息 - Pattern-Based Extraction of Negative Polarity Items from Dependency-Parsed Text

Pattern-Based Extraction of Negative Polarity Items from Dependency-Parsed Text

We describe a new method for extracting Negative Polarity Item candidates (NPI candidates) from dependency-parsed German text corpora. Semi-automatic extraction of NPIs is a challenging task since NPIs do not have uniform categorical or other syntactic properties that could be used for detecting them; they occur as single words or as multi-word expressions of almost any syntactic category. Their defining property is of a semantic nature, they may only occur in the scope of negation and related semantic operators. In contrast to an earlier approach to NPI extraction from corpora, we specifically target multi-word expressions. Besides applying statistical methods to measure the co-occurrence of our candidate expressions with negative contexts, we also apply linguistic criteria in an attempt to determine to which degree they are idiomatic. Our method is evaluated by comparing the set of NPIs we found with the most comprehensive electronic list of German NPIs, which currently contains 165 entries. Our method retrieved 142 NPIs, 114 of which are new.

Fabienne Fritzinger | Frank Richter | Marion Weller

[1] E. Herburger. Negative contexts'. Collocation, polarity and multiple negation , 2000 .

[2] Cristian Danescu-Niculescu-Mizil,et al. Without a ’doubt’? Unsupervised Discovery of Downward-Entailing Operators , 2009, NAACL.

[3] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[4] Jörg Tiedemann,et al. Identifying idiomatic expressions using automatic word-alignment , 2006 .

[5] T. V. Wouden. Negative Contexts: Collocation, Polarity and Multiple Negation , 1997 .

[6] Wilfried Kürschner,et al. Studien zur Negation im Deutschen , 1983 .

[7] Ulrich Heid,et al. Providing corpus data for a dictionary for German juridical phraseology , 2008, KONVENS.

[8] Stefan Evert,et al. The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[9] F. Zwarts. Three Types of Polarity , 1997 .

[10] Michael Schiehlen. A Cascaded Finite-State Parser for German , 2003, EACL.

[11] Manfred Sailer,et al. A Multilingual Electronic Database of Distributionally Idiosyncratic Items , 2008 .

[12] Fabienne Fritzinger. Using parallel text for the extraction of German multiword expressions , 2010 .

[13] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.