Building Chinese Discourse Corpus with Connective-driven Dependency Tree Structure

In this paper, we propose a Connectivedriven Dependency Tree (CDT) scheme to represent the discourse rhetorical structure in Chinese language, with elementary discourse units as leaf nodes and connectives as non-leaf nodes, largely motivated by the Penn Discourse Treebank and the Rhetorical Structure Theory. In particular, connectives are employed to directly represent the hierarchy of the tree structure and the rhetorical relation of a discourse, while the nuclei of discourse units are globally determined with reference to the dependency theory. Guided by the CDT scheme, we manually annotate a Chinese Discourse Treebank (CDTB) of 500 documents. Preliminary evaluation justifies the appropriateness of the CDT scheme to Chinese discourse analysis and the usefulness of our manually annotated CDTB corpus.

[1]  Zhu Kunhua,et al.  Research of Chinese Clause Identificiton Based on Comma , 2013 .

[2]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[3]  Yuping Zhou,et al.  PDTB-style Discourse Annotation of Chinese Text , 2012, ACL.

[4]  Yue Ming,et al.  Rhetorical Structure Annotation of Chinese News Commentaries , 2008 .

[5]  Nianwen Xue,et al.  Chinese sentence segmentation as comma classification , 2011, ACL.

[6]  Hsin-Hsi Chen,et al.  Chinese Discourse Relation Recognition , 2011, IJCNLP.

[7]  Wang Wen The Current Research Situation of the Clause in Modern Chinese , 2010 .

[8]  D. G. Hays Dependency Theory: A Formalism and Some Observations , 1964 .

[9]  Hwee Tou Ng,et al.  Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[10]  Lou Boves,et al.  Discourse-based answering of why-questions , 2006, Trait. Autom. des Langues.

[11]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[12]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[13]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[14]  Guodong Zhou,et al.  Elementary Discourse Unit in Chinese Discourse Structure Analysis , 2012, CLSW.

[15]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[16]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[17]  Nianwen Xue,et al.  Annotating Discourse Connectives in the Chinese Treebank , 2005, FCA@ACL.