Multi-class Animacy Classification with Semantic Features

Animacy is the semantic property of nouns denoting whether an entity can act, or is perceived as acting, of its own will. This property is marked grammatically in various languages, albeit rarely in English. It has recently been highlighted as a relevant property for NLP applications such as parsing and anaphora resolution. In order for animacy to be used in conjunction with other semantic features for such applications, appropriate data is necessary. However, the few corpora which do contain animacy annotation, rarely contain much other semantic information. The addition of such an annotation layer to a corpus already containing deep semantic annotation should therefore be of particular interest. The work presented in this paper contains three main contributions. Firstly, we improve upon the state of the art in multiclass animacy classification. Secondly, we use this classifier to contribute to the annotation of an openly available corpus containing deep semantic annotation. Finally, we provide source code, as well as trained models and scripts needed to reproduce the results presented in this paper, or aid in annotation of other texts. 1

[1]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[2]  Johan Bos,et al.  Developing a large semantically annotated corpus , 2012, LREC.

[3]  Anette Rosenbach,et al.  Animacy and grammatical variation—Findings from English genitive variation , 2008 .

[4]  Heng Ji,et al.  Gender and Animacy Knowledge Discovery from Web-Scale N-Grams for Unsupervised Person Mention Detection , 2009, PACLIC.

[5]  Wen-tau Yih,et al.  Animacy Detection with Voting Models , 2013, EMNLP.

[6]  R. Harald Baayen,et al.  Predicting the dative alternation , 2007 .

[7]  Johan Bos,et al.  Gamification for Word Sense Labeling , 2013, IWCS.

[8]  Gosse Bouma,et al.  Automatic animacy classification for Dutch , 2013, CLIN 2013.

[9]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Jean Carletta,et al.  Animacy Encoding in English: Why and How , 2004, ACL 2004.

[12]  Kepa Sarasola,et al.  Semiautomatic Labelling of Semantic Features , 2002, COLING.

[13]  Joakim Nivre,et al.  Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation , 2006, LREC.

[14]  Chris Brew,et al.  Multilingual Animacy Classification by Sparse Logistic Regression , 2010 .

[15]  Ö. Dahl,et al.  Animacy in grammar and discourse , 1996 .

[16]  Joakim Nivre,et al.  When word order and part-of-speech tags are not enough - Swedish dependency parsing with rich linguistic features , 2007 .

[17]  Anatol Stefanowitsch,et al.  Constructional semantics as a limit to grammatical alternation: The two genitives of English , 2003 .

[18]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[19]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[20]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[21]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[22]  Mark Steedman,et al.  The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue , 2010, Lang. Resour. Evaluation.

[23]  Conor Quinn,et al.  A Preliminary Survey of Animacy Categories in Penobscot , 2001 .

[24]  Samuel R. Bowman,et al.  Automatic Animacy Classification , 2012, HLT-NAACL.

[25]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[26]  Lilja Øvrelid Animacy classification based on morphosyntactic corpus frequencies : some experiments with Norwegian nouns , 2005 .

[27]  Lilja Øvrelid,et al.  Empirical Evaluations of Animacy Annotation , 2009, EACL.

[28]  R. J. Evans,et al.  NP Animacy Identification for Anaphora Resolution , 2007, J. Artif. Intell. Res..

[29]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[30]  Lilja Øvrelid,et al.  Towards Robust Animacy Classification Using Morphosyntactic Distributional Features , 2006, EACL.

[31]  Rashmi Prasad,et al.  The Penn Discourse TreeBank as a Resource for Natural Language Generation , 2005 .

[32]  Johan Bos,et al.  Wide-Coverage Semantic Analysis with Boxer , 2008, STEP.

[33]  Johan Bos,et al.  A platform for collaborative semantic annotation , 2012, EACL.

[34]  Vito Pirrelli,et al.  Climbing the Path to Grammar: A Maximum Entropy Model of Subject/Object Learning , 2005, ACL 2005.

[35]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[36]  Christiane Fellbaum,et al.  Obituary: George A. Miller , 2013, CL.

[37]  Amy Beth Warriner,et al.  Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.

[38]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.