论文信息 - Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation

Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation

This paper describes results of a study related to the PARSEME Shared Task on automatic detection of verbal Multi-Word Expressions (MWEs) which focuses on their identification in running texts in many languages. The Shared Task’s organizers have provided basic annotation guidelines where four basic types of verbal MWEs are defined including some specific subtypes. Czech is among the twenty languages selected for the task. We will contribute to the Shared Task dataset, a multilingual open resource, by converting data from the Prague Dependency Treebank (PDT) to the Shared Task format. The question to answer is to which extent this can be done automatically. In this paper, we concentrate on one of the relevant MWE categories, namely on the quasi-universal category called “Inherently Pronominal Verbs” (IPronV) and describe its annotation in the Prague Dependency Treebank. After comparing it to the Shared Task guidelines, we can conclude that the PDT and the associated valency lexicon, PDT-Vallex, contain sufficient information for the conversion, even if some specific instances will have to be checked. As a side effect, we have identified certain errors in PDT annotation which can now be automatically corrected.

[1] Markéta Lopatková,et al. Reflexive Verbs in a Valency Lexicon: The Case of Czech Reflexive Morphemes , 2014 .

[2] P. Luelsdorff. The Prague School of Structural and Functional Linguistics , 1994 .

[3] V. Vincze. Annotation guidelines for the PARSEME shared task on automatic detection of verbal Multi Word Expressions version 5 . 0 4 March 2016 , .

[4] Adam Przepiórkowski,et al. A survey of multiword expressions in treebanks , 2015 .

[5] J. Panevová. Znovu o reciprocitě , 2007 .

[6] Eduard Bejcek,et al. Prague Dependency Treebank 2.5 – a Revisited Version of PDT 2.0 , 2012, COLING.

[7] Adam Przepiórkowski,et al. PARSEME – PARSing and Multiword Expressions within a European multilingual network , 2015 .

[8] Zdenka Uresová,et al. CzEngVallex: a Bilingual Czech-English Valency Lexicon , 2016, Prague Bull. Math. Linguistics.

[9] Petr Sgall,et al. The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[10] Ondrej Dusek,et al. Verbal Valency Frame Detection and Selection in Czech and English , 2014, EVENTS@ACL.

[11] Vladimir Turaev,et al. On Reciprocity , 2005 .