Automatic animacy classification for Dutch

We present an automatic animacy classifier for Dutch that can determine the animacy status of nouns -- how alive the noun's referent is (human, inanimate, etc.). Animacy is a semantic property that has been shown to play a role in human sentence processing, felicity and grammaticality. Although animacy is not marked explicitly in Dutch, we expect knowledge about animacy to be helpful for parsing, translation and other NLP tasks. Only a few animacy classifiers and animacy- annotated corpora exist internationally. For Dutch, animacy information is only available in the Cornetto lexical-semantic database. We augment this lexical information with context information from the Dutch Lassy Large treebank, to create training data for an animacy classifier that uses a novel kind of context features. We use the k-nearest neighbour algorithm with distributional lexical features, e.g. how frequently the noun occurs as a subject of the verb `to think' in a corpus, to decide on the (pre-dominant) animacy class. The size of the Lassy Large corpus makes this possible, and the high level of detail these word association features provide, results in accurate Dutch-language animacy classification.

[1]  N. J. van Kampen,et al.  Relative agreement in Dutch , 2007 .

[2]  Jean Carletta,et al.  Animacy Encoding in English: Why and How , 2004, ACL 2004.

[3]  Piek T. J. M. Vossen,et al.  DutchSemCor: Targeting the ideal sense-tagged corpus , 2012, LREC.

[4]  Chris Brew,et al.  Multilingual Animacy Classification by Sparse Logistic Regression , 2010 .

[5]  M. Silverstein 7. Hierarchy of Features and Ergativity , 1986 .

[6]  Greville G. Corbett,et al.  Gender, Animacy, and Declensional Class Assignment: A Unified Account for Russian , 1995 .

[7]  R. J. Evans,et al.  NP Animacy Identification for Anaphora Resolution , 2007, J. Artif. Intell. Res..

[8]  Herbert Schriefers,et al.  The Influence of Animacy on Relative Clause Processing , 2002 .

[9]  Samuel R. Bowman,et al.  Automatic Animacy Classification , 2012, HLT-NAACL.

[10]  Lilja Øvrelid Animacy classification based on morphosyntactic corpus frequencies : some experiments with Norwegian nouns , 2005 .

[11]  Barbara Plank,et al.  Dutch Dependency Parser Performance Across Domains , 2010 .

[12]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[13]  R. Harald Baayen,et al.  Predicting the dative alternation , 2007 .

[14]  Scott Delancey,et al.  An Interpretation of Split Ergativity and Related Patterns , 1981 .

[15]  Lilja Øvrelid,et al.  Empirical Evaluations of Animacy Annotation , 2009, EACL.

[16]  Gerlof Bouma,et al.  Starting a sentence in Dutch : a corpus study of subject- and object-fronting , 2008 .

[17]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[18]  Kepa Sarasola,et al.  Semiautomatic Labelling of Semantic Features , 2002, COLING.

[19]  Lilja Øvrelid,et al.  Disambiguation of syntactic functions in Norwegian: modeling variation in word order interpretations conditioned by animacy and definiteness , 2004 .

[20]  Judith Aissen On the syntax of obviation , 1997 .

[21]  Conor Quinn,et al.  A Preliminary Survey of Animacy Categories in Penobscot , 2001 .

[22]  Sander Lestrade,et al.  Animacy, argument structure, and argument encoding , 2008 .

[23]  Ö. Dahl,et al.  Animacy in grammar and discourse , 1996 .

[24]  Frank Van Eynde,et al.  Large Scale Syntactic Annotation of Written Dutch: Lassy , 2013, Essential Speech and Language Technology for Dutch.

[25]  Heng Ji,et al.  Gender and Animacy Knowledge Discovery from Web-Scale N-Grams for Unsupervised Person Mention Detection , 2009, PACLIC.

[26]  Lilja Øvrelid,et al.  Linguistic features in data-driven dependency parsing , 2008, CoNLL.

[27]  Antal van den Bosch Wrapped progressive sampling search for optimizing learning algorithm parameters , 2005 .

[28]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.