论文信息 - Using Very Large Parsed Corpora and Judgment Data to Classify Verb Reflexivity

Using Very Large Parsed Corpora and Judgment Data to Classify Verb Reflexivity

Dutch has two reflexive pronouns, zich and zichzelf. When is each one used? This question has been debated in the literature on binding theory, reflexives and anaphora resolution. Partial solutions have attempted to use syntactic binding domains, semantic features and pragmatic concepts such as focus to predict reflexive choice, but until now no experimental data either in favor of or against one of these theories is available. In this paper we look at reflexive choice on the basis of empirical data: a large scale corpus study and an online questionnaire. On the basis of the results of both experiments, we are able to predict the choice between the two reflexive items in Dutch without assuming a distinction between verbs that occur with zich or zichzelf a priori (cf. a distinction in terms like 'inherent reflexivity' (Reinhart and Reuland, 1993)). Instead, we examine the distribution of zich and zichzelf using the Clef corpus, a 70 million word Very Large Corpus of Dutch. The corpus is tagged and parsed. This allows us to identify the typical action the verbs are used to describe: reflexive or non-reflexive actions. Regression analysis shows that, by doing so, 21% of the distribution of the two reflexive items in Dutch can be predicted. Using the verb reflexivity found in the corpus study even allows us to explain 83% of the participants' choices in the online study between zich and zichzelf. As such, both the corpus study and the online questionnaire confirm the group of verbs called 'inherent reflexive verbs' without postulating the group beforehand. We further discovered that even inherently reflexive verbs, which are argued to never co-occur with zichzelf, sometimes had zichzelf chosen as the preferred argument in the questionnaire, and to a lesser degree, in the corpus suggesting that the verb classes are tendential and not categorical.

[1] Gertjan van Noord,et al. Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[2] M. Anderson,et al. NOUN PHRASE STRUCTURE , 1979 .

[3] Gilad Mishne,et al. Preprocessing documents to answer Dutch questions , 2003 .

[4] Otto Jespersen,et al. Essentials of English Grammar , 1933 .

[5] Nathan Salmon,et al. Reflexivity , 1986, Notre Dame J. Formal Log..

[6] Barbara H. Partee,et al. Quantification, Pronouns, and VP Anaphora , 1984 .

[7] Eric Reuland,et al. Long-distance anaphora: Index , 1991 .

[8] Jennifer Spenader,et al. Proceedings of the ESSLLI\'04 Workshop on Semantic Approaches to Binding Theory , 2004 .

[9] M den Dikken,et al. The function of function words and functional categories , 2005 .

[10] M. Everaert. The Syntax of Reflexivization , 1986 .

[11] Maria Luisa Zubizarreta,et al. Levels of representation in the lexicon and in the syntax , 1987 .

[12] W.J.M. Haeseryn. Algemene Nederlandse spraakkunst , 1997 .

[13] 岩田一男,et al. Essentials of English grammar , 1972 .

[14] P. Schlenker,et al. Proceedings of the ESSLLI\'04 Workshop on Semantic Approaches to Binding Theory , 2004 .

[15] Christina Tortora,et al. Reflexives in contexts of reduced valency: German vs. Dutch , 2005 .

[16] Jan Koster,et al. Long-distance anaphora: Long-distance anaphora: an overview , 1991 .