Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions

This paper describes the augmentation of an existing corpus of child-directed speech. The resulting corpus is a gold-standard labeled corpus for supervised learning of semantic role labels in adult-child dialogues. Semantic role labeling (SRL) models assign semantic roles to sentence constituents, thus indicating who has done what to whom (and in what way). The current corpus is derived from the Adam files in the Brown corpus (Brown 1973) of the CHILDES corpora, and augments the partial annotation described in Connor et al. (2010). It provides labels for both semantic arguments of verbs and semantic arguments of prepositions. The semantic role labels and senses of verbs follow Propbank guidelines Kingsbury and Palmer, 2002; Gildea and Palmer 2002; Palmer et al., 2005) and those for prepositions follow Srikumar and Roth (2011). The corpus was annotated by two annotators. Inter-annotator agreement is given separately for prepositions and verbs, and for adult speech and child speech. Overall, across child and adult samples, including verbs and prepositions, the kappa score for sense is 72.6, for the number of semantic-role-bearing arguments, the kappa score is 77.4, for identical semantic role labels on a given argument, the kappa score is 91.1, for the span of semantic role labels, and the kappa for agreement is 93.9. The sense and number of arguments was often open to multiple interpretations in child speech, due to the rapidly changing discourse and omission of constituents in production. Annotators used a discourse context window of ten sentences before and ten sentences after the target utterance to determine the annotation labels. The derived corpus is available for use in CHAT (MacWhinney, 2000) and XML format.

[1]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[2]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[3]  Jill Lany,et al.  From Statistics to Meaning , 2010, Psychological science.

[4]  Dan Roth,et al.  Starting from Scratch in Semantic Role Labeling , 2010, ACL.

[5]  Martha Palmer,et al.  Current Directions in English and Arabic PropBank , 2017 .

[6]  David R. Dowty Thematic proto-roles and argument selection , 1991 .

[7]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[8]  Ralph Grishman,et al.  Annotating Noun Argument Structure for NomBank , 2004, LREC.

[9]  A. Lavie,et al.  Morphosyntactic annotation of CHILDES transcripts. , 2010, Journal of child language.

[10]  Vivek Srikumar,et al.  The semantics of role labeling , 2013 .

[11]  Dan Roth,et al.  Minimal supervision for language learning: bootstrapping global patterns from local knowledge , 2011 .

[12]  Dan Roth,et al.  Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision , 2013, Cognitive Aspects of Computational Language Acquisition.

[13]  Dan Roth,et al.  A Joint Model for Extended Semantic Role Labeling , 2011, EMNLP.

[14]  C. Fisher,et al.  Predicted errors in children’s early sentence comprehension , 2012, Cognition.

[15]  Alon Lavie,et al.  High-accuracy Annotation and Parsing of CHILDES Transcripts , 2007 .

[16]  Martha Palmer,et al.  PropBank: Semantics of New Predicate Types , 2014, LREC.

[17]  C. Fisher,et al.  Learning Words and Rules , 2006, Psychological science.

[18]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[19]  Sylvia Yuan,et al.  Syntactic bootstrapping. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[20]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[21]  Martha Palmer,et al.  Propbank Instance Annotation Guidelines Using a Dedicated Editor, Jubilee , 2010, LREC.

[22]  Vivek Srikumar,et al.  A corpus of preposition supersenses in English web reviews , 2016, ArXiv.

[23]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[24]  R. Brown,et al.  A First Language , 1973 .

[25]  Lisa Pearl,et al.  Syntactic Islands and Learning Biases: Combining Experimental Syntax and Computational Modeling to Investigate the Language Acquisition Problem , 2013 .

[26]  Michael C. Frank,et al.  Continuity of Discourse Provides Information for Word Learning , 2007 .

[27]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[28]  Rushen Shi,et al.  Syntactic Categorization in French-Learning Infants. , 2010, Infancy : the official journal of the International Society on Infant Studies.

[29]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .