Automatic animacy classication

We present an automatic animacy classier for Dutch that can determine the animacy status of nouns | how alive the noun’s referent is (human, inanimate, etc.). Animacy is a semantic property that has been shown to play a role in human sentence processing, felicity and grammaticality. Although animacy is not marked explicitly in Dutch, we expect knowledge about animacy to be helpful for parsing, translation and other NLP tasks. Only a few animacy classiers and animacyannotated corpora exist internationally. For Dutch, animacy information is only available in the Cornetto lexical-semantic database. We augment this lexical information with context information from the Dutch Lassy Large treebank, to create training data for an animacy classier that uses a novel kind of context features. We use the k-nearest neighbour algorithm with distributional lexical features, e.g. how frequently the noun occurs as a subject of the verb ‘to think’ in a corpus, to decide on the (predominant) animacy class. The size of the Lassy Large corpus makes this possible, and the high level of detail these word association features provide, results in accurate Dutch-language animacy classication.

[1]  Greville G. Corbett,et al.  Gender, Animacy, and Declensional Class Assignment: A Unified Account for Russian , 1995 .

[2]  Lilja Øvrelid,et al.  Linguistic features in data-driven dependency parsing , 2008, CoNLL.

[3]  Ö. Dahl,et al.  Animacy in grammar and discourse , 1996 .

[4]  Kepa Sarasola,et al.  Semiautomatic Labelling of Semantic Features , 2002, COLING.

[5]  Barbara Plank,et al.  Dutch Dependency Parser Performance Across Domains , 2010 .

[6]  Lilja Øvrelid,et al.  Empirical Evaluations of Animacy Annotation , 2009, EACL.

[7]  Sander Lestrade,et al.  Animacy, argument structure, and argument encoding , 2008 .

[8]  Antal van den Bosch Wrapped progressive sampling search for optimizing learning algorithm parameters , 2005 .

[9]  R. Harald Baayen,et al.  Predicting the dative alternation , 2007 .

[10]  Judith Aissen On the syntax of obviation , 1997 .

[11]  Lilja Øvrelid,et al.  Disambiguation of syntactic functions in Norwegian: modeling variation in word order interpretations conditioned by animacy and definiteness , 2004 .

[12]  Piek T. J. M. Vossen,et al.  DutchSemCor: Targeting the ideal sense-tagged corpus , 2012, LREC.

[13]  N. J. van Kampen,et al.  Relative agreement in Dutch , 2007 .

[14]  Jean Carletta,et al.  Animacy Encoding in English: Why and How , 2004, ACL 2004.

[15]  Herbert Schriefers,et al.  The Influence of Animacy on Relative Clause Processing , 2002 .

[16]  Heng Ji,et al.  Gender and Animacy Knowledge Discovery from Web-Scale N-Grams for Unsupervised Person Mention Detection , 2009, PACLIC.

[17]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[18]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[19]  R. J. Evans,et al.  NP Animacy Identification for Anaphora Resolution , 2007, J. Artif. Intell. Res..

[20]  Samuel R. Bowman,et al.  Automatic Animacy Classification , 2012, HLT-NAACL.

[21]  Gerlof Bouma,et al.  Starting a sentence in Dutch : a corpus study of subject- and object-fronting , 2008 .

[22]  Scott Delancey,et al.  An Interpretation of Split Ergativity and Related Patterns , 1981 .