Situation entity types: automatic classification of clause-level aspect

This paper describes the first robust approach to automatically labeling clauses with their situation entity type (Smith, 2003), capturing aspectual phenomena at the clause level which are relevant for interpreting both semantics at the clause level and discourse structure. Previous work on this task used a small data set from a limited domain, and relied mainly on words as features, an approach which is impractical in larger settings. We provide a new corpus of texts from 13 genres (40,000 clauses) annotated with situation entity types. We show that our sequence labeling approach using distributional information in the form of Brown clusters, as well as syntactic-semantic features targeted to the task, is robust across genres, reaching accuracies of up to 76%.

[1]  Emmon Bach,et al.  The algebra of events , 1986, The Language of Time - A Reader.

[2]  Alice ter Meulen,et al.  Genericity: An Introduction , 1995 .

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Nils Reiter,et al.  Identifying Generic Noun Phrases , 2010, ACL.

[5]  Christiane Fellbaum,et al.  MASC: the Manually Annotated Sub-Corpus of American English , 2008, LREC.

[6]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7]  Alexis Palmer,et al.  Automatic prediction of aspectual class of verbs in context , 2014, ACL.

[8]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[9]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[10]  Michael Richter,et al.  Automatic Induction of German Aspectual Verb Classes in a Distributional Framework , 2015, GSCL.

[11]  António Branco,et al.  Aspectual Type and Temporal Relation Classification , 2012, EACL.

[12]  Ido Dagan,et al.  TruthTeller: Annotating Predicate Truth , 2013, NAACL.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  S. Roumyana ASPECTUAL ENTITIES AND TENSE IN DISCOURSE , 2002 .

[15]  Alessandro Lenci,et al.  Computational Models for Event Type Classification in Context , 2008, LREC.

[16]  W. Francis A Standard Corpus of Edited Present-Day American English , 1965 .

[17]  Kathleen McKeown,et al.  Learning Methods to Combine Linguistic Indicators:Improving Aspectual Classification and Revealing Linguistic Insights , 2000, CL.

[18]  Wouter Weerkamp,et al.  What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation , 2015, ACL.

[19]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[20]  Anette Frank,et al.  Argumentative texts and clause types , 2016, ArgMining@ACL.

[21]  Sharid Loáiciga,et al.  English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling , 2014, LREC.

[22]  Carlota S. Smith,et al.  Time With and Without Tense , 2008 .

[23]  Van Durme,et al.  Extracting implicit knowledge from text , 2009 .

[24]  Iryna Gurevych,et al.  A broad-coverage collection of portable NLP components for building shareable analysis pipelines , 2014, OIAF4HLT@COLING.

[25]  Manfred Pinkal,et al.  Automatic recognition of habituals: a three-way classification of clausal aspect , 2015, EMNLP.

[26]  Christoph M. Friedrich,et al.  Feature Subset Selection in Conditional Random Fields for Named Entity Recognition , 2009, RANLP.

[27]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[28]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[29]  Manfred Pinkal,et al.  Discourse-sensitive Automatic Identification of Generic Expressions , 2015, ACL.

[30]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[31]  Thomas A. Mathew,et al.  Supervised categorization for habitual versus episodic sentences , 2009 .

[32]  Alexis Palmer,et al.  Situation Entity Annotation , 2014, LAW@COLING.

[33]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[34]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[35]  Roman Klinger,et al.  Classical Probabilistic Models and Conditional Random Fields , 2007 .

[36]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[37]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[38]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[39]  Jason Baldridge,et al.  A Sequencing Model for Situation Entity Classification , 2007, ACL.

[40]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[41]  Annelen Brunner,et al.  Automatic recognition of speech, thought, and writing representation in German narrative texts , 2013, Lit. Linguistic Comput..

[42]  Carlota S. Smith,et al.  Modes of Discourse: The Local Structure of Texts , 2009 .

[43]  Manfred Pinkal,et al.  Annotating genericity: a survey, a scheme, and a corpus , 2015, LAW@NAACL-HLT.

[44]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[45]  Zeno Vendler,et al.  Verbs and Times , 1957, The Language of Time - A Reader.

[46]  Christiane Fellbaum,et al.  The Manually Annotated Sub-Corpus: A Community Resource for and by the People , 2010, ACL.

[47]  Philip L. Peterson,et al.  On Representing Event Reference , 1997 .