Generalizing the semantic roles in the Chinese Proposition Bank

AbstractThe Chinese Proposition Bank (CPB) is a corpus annotated with semantic roles for the arguments of verbal and nominalized predicates. The semantic roles for the core arguments are defined in a predicate-specific manner. That is, a set of semantic roles, numerically identified, are defined for each sense of a predicate lemma and recorded in a valency lexicon called frame files. The predicate-specific manner in which the semantic roles are defined reduces the cognitive burden on the annotators since they only need to internalize a few roles at a time and this has contributed to the consistency in annotation. It was also a sensible approach given the contentious issue of how many semantic roles are needed if one were to adopt of set of global semantic roles that apply to all predicates. A downside of this approach, however, is that the predicate-specific roles may not be consistent across predicates, and this inconsistency has a negative impact on training automatic systems. Given the progress that has been made in defining semantic roles in the last decade or so, time is ripe for adopting a set of general semantic roles. In this article, we describe our effort to “re-annotate” the CPB with a set of “global” semantic roles that are predicate-independent and investigate their impact on automatic semantic role labeling systems. When defining these global semantic roles, we strive to make them compatible with a recently published ISO standards on the annotation of semantic roles (ISO 24617-4:2014 SemAF-SR) while taking the linguistic characteristics of the Chinese language into account. We show that in spite of the much larger number of global semantic roles, the accuracy of an off-the-shelf semantic role labeling system retrained on the data re-annotated with global semantic roles is comparable to that trained on the data set with the original predicate-specific semantic roles. We also argue that the re-annotated data set, together with the original data, provides the user with more flexibility when using the corpus.

[1]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[2]  Christopher D. Manning,et al.  Joint Learning Improves Semantic Role Labeling , 2005, ACL.

[3]  Meng Wang,et al.  Chinese Semantic Role Labeling with Shallow Parsing , 2009, EMNLP.

[4]  Harry Bunt,et al.  LIRICS Semantic Role Annotation: Design and Evaluation of a Set of Data Categories , 2008, LREC.

[5]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[6]  Baobao Chang,et al.  Improving Chinese Semantic Role Classification with Hierarchical Feature Selection Strategy , 2008, EMNLP.

[7]  Nianwen Xue,et al.  Automatic Semantic Role Labeling for Chinese Verbs , 2005, IJCAI.

[8]  Nianwen Xue,et al.  Adding semantic roles to the Chinese Treebank , 2009, Natural Language Engineering.

[9]  Hai Zhao,et al.  Improving Nominal SRL in Chinese Language with Verbal SRL Information and Automatic Predicate Recognition , 2009, EMNLP.

[10]  Weiwei Sun Improving Chinese Semantic Role Labeling with Rich Syntactic Features , 2010, ACL.

[11]  Harry Bunt,et al.  Defining semantic roles , 2007 .

[12]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[13]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[14]  Hwee Tou Ng,et al.  Joint Syntactic and Semantic Parsing of Chinese , 2010, ACL.

[15]  Nianwen Xue,et al.  Labeling Chinese Predicates with Semantic Roles , 2008, CL.

[16]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[17]  Nianwen Xue,et al.  Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[18]  Nianwen Xue A Chinese semantic lexicon of senses and roles , 2006, Lang. Resour. Evaluation.

[19]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[20]  Daniel Jurafsky,et al.  Semantic Role Labeling Using Different Syntactic Views , 2005, ACL.

[21]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[22]  Daniel Jurafsky,et al.  Semantic Role Labeling by Tagging Syntactic Chunks , 2004, CoNLL.

[23]  Yongqiang Li,et al.  Multilingual Dependency-based Syntactic and Semantic Parsing , 2009, CoNLL Shared Task.

[24]  Lucia Specia,et al.  Improving Chunk-based Semantic Role Labeling with Lexical Features , 2011, RANLP.

[25]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[26]  Dan Roth,et al.  Generalized Inference with Multiple Semantic Role Labeling Systems , 2005, CoNLL.