MCDTB: A Macro-level Chinese Discourse TreeBank

In view of the differences between the annotations of micro and macro discourse rela-tionships, this paper describes the relevant experiments on the construction of the Macro Chinese Discourse Treebank (MCDTB), a higher-level Chinese discourse corpus. Fol-lowing RST (Rhetorical Structure Theory), we annotate the macro discourse information, including discourse structure, nuclearity and relationship, and the additional discourse information, including topic sentences, lead and abstract, to make the macro discourse annotation more objective and accurate. Finally, we annotated 720 articles with a Kappa value greater than 0.6. Preliminary experiments on this corpus verify the computability of MCDTB.

[1]  Dietrich Rebholz-Schuhmann,et al.  A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task , 2013, EMNLP.

[2]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[3]  V. Dijk,et al.  Macrostructures , 2019 .

[4]  Liu Kai-ying Description Systems of the Chinese FrameNet Database and Software Tools , 2007 .

[5]  J. Hobbs On the coherence and structure of discourse , 1985 .

[6]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[7]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[8]  William C. Mann,et al.  Rhetorical structure theory and text analysis , 1989 .

[9]  Fang Kong,et al.  Building Chinese Discourse Corpus with Connective-driven Dependency Tree Structure , 2014, EMNLP.

[10]  Nazli Goharian,et al.  Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure , 2015, EMNLP.

[11]  Alex Lascarides,et al.  Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure , 2004, COLING.

[12]  D. Marcu,et al.  Experiments in Constructing a Corpus of Discourse Trees : Problems , Annotation Choices , Issues , 1999 .

[13]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[14]  David Rose,et al.  Working with Discourse: Meaning Beyond the Clause , 2003 .

[15]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[16]  Guodong Zhou,et al.  Negation Focus Identification with Contextual Discourse Information , 2014, ACL.

[17]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[18]  Li Yancu Automatic Recognition and Classification on Chinese Discourse Connective , 2015 .

[19]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[20]  D. K. Davis News as Discourse , 1989 .

[21]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[22]  Houfeng Wang,et al.  A Two-Stage Parsing Method for Text-Level Discourse Analysis , 2017, ACL.

[23]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[24]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[25]  Rashmi Prasad,et al.  The Penn Discourse Treebank , 2004, LREC.

[26]  Nianwen Xue,et al.  Annotating Discourse Connectives in the Chinese Treebank , 2005, FCA@ACL.

[27]  Yuping Zhou,et al.  The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations , 2015, Lang. Resour. Evaluation.

[28]  Zhu Qiaoming,et al.  A Macro Discourse Primary and Secondary Relation Recognition Method Based on Topic Similarity , 2017 .