论文信息 - Towards Robust Animacy Classification Using Morphosyntactic Distributional Features

Towards Robust Animacy Classification Using Morphosyntactic Distributional Features

This paper presents results from experiments in automatic classification of animacy for Norwegian nouns using decision-tree classifiers. The method makes use of relative frequency measures for linguistically motivated morphosyntactic features extracted from an automatically annotated corpus of Norwegian. The classifiers are evaluated using leave-one-out training and testing and the initial results are promising (approaching 90% accuracy) for high frequency nouns, however deteriorate gradually as lower frequency nouns are classified. Experiments attempting to empirically locate a frequency threshold for the classification method indicate that a subset of the chosen morphosyntactic features exhibit a notable resilience to data sparseness. Results will be presented which show that the classification accuracy obtained for high frequency nouns (with absolute frequencies >1000) can be maintained for nouns with considerably lower frequencies (~50) by backing off to a smaller set of features at classification.

Lilja Øvrelid | Lilja Øvrelid

[1] Jean Carletta,et al. Animacy Encoding in English: Why and How , 2004, ACL 2004.

[2] Hanjung Lee,et al. Prominence Mismatch and Markedness Reduction in Word Order , 2003 .

[3] Eugene Charniak,et al. Getting Useful Gender Statistics from English Text , 1998 .

[4] Atro Voutilainen,et al. A language-independent system for parsing unrestricted text , 1995 .

[5] Judith Aissen,et al. Differential Object Marking: Iconicity vs. Economy , 2003 .

[6] 金田重郎,et al. C4.5: Programs for Machine Learning (書評) , 1995 .

[7] R. Harald Baayen,et al. Predicting the dative alternation , 2007 .

[8] Lilja Øvrelid,et al. Disambiguation of syntactic functions in Norwegian: modeling variation in word order interpretations conditioned by animacy and definiteness , 2004 .

[9] David R. Dowty. Thematic proto-roles and argument selection , 1991 .

[10] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[11] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[12] Suzanne Stevenson,et al. Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[13] Richard Evans,et al. Learning to identify animate references , 2001, CoNLL.

[14] Shipra Dingare,et al. The Effect of Feature Hierarchies on Frequencies of Passivization in English , 2001 .