Improving the Annotation of Sentence Specificity

We introduce improved guidelines for annotation of sentence specificity, addressing the issues encountered in prior work. Our annotation provides judgements of sentences in context. Rather than binary judgements, we introduce a specificity scale which accommodates nuanced judgements. Our augmented annotation procedure also allows us to define where in the discourse context the lack of specificity can be resolved. In addition, the cause of the underspecification is annotated in the form of free text questions. We present results from a pilot annotation with this new scheme and demonstrate good inter-annotator agreement. We found that the lack of specificity distributes evenly among immediate prior context, long distance prior context and no prior context. We find that missing details that are not resolved in the the prior context are more likely to trigger questions about the reason behind events, “why” and “how”. Our data is accessible at http://www.cis.upenn.edu/~nlp/corpora/lrec16spec.html

[1]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[2]  M. Stinson,et al.  Specificity of word meaning and use of sentence context by hearing-impaired adults. , 1983, Journal of communication disorders.

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  A. Papafragou On generics * , 1996 .

[5]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[6]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[7]  C. Clifton,et al.  Scale structure: Processing minimum standard and maximum standard scalar adjectives , 2008, Cognition.

[8]  Nils Reiter,et al.  Identifying Generic Noun Phrases , 2010, ACL.

[9]  Christopher Potts,et al.  Learning the meaning of scalar adjectives , 2010 .

[10]  Ani Nenkova,et al.  Text Specificity and Impact on Quality of News Summaries , 2011, Monolingual@ACL.

[11]  Ani Nenkova,et al.  Automatic identification of general and specific sentences by leveraging discourse annotations , 2011, IJCNLP.

[12]  Katja Markert,et al.  Modelling Entity Instantiations , 2011, RANLP.

[13]  Michael Strube,et al.  Local and Global Context for Supervised and Unsupervised Metonymy Resolution , 2012, EMNLP-CoNLL.

[14]  Ani Nenkova,et al.  A corpus of general and specific sentences from news , 2012, LREC.

[15]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[16]  Ani Nenkova,et al.  A corpus of science journalism for analyzing writing quality , 2013, Dialogue Discourse.

[17]  Anna Nedoluzhko Generic noun phrases and annotation of coreference and bridging relations in the Prague Dependency Treebank , 2013, LAW@ACL.

[18]  Ani Nenkova,et al.  Detecting Information-Dense Texts in Multiple News Domains , 2014, AAAI.

[19]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[20]  Junyi Jessy Li,et al.  Fast and Accurate Prediction of Sentence Specificity , 2015, AAAI.

[21]  Brian Ecker,et al.  Argument Mining: Extracting Arguments from Online Dialogue , 2015, SIGDIAL Conference.