Building a Macro Chinese Discourse Treebank

Discourse structure analysis is an important research topic in natural language processing. Discourse structure analysis not only helps to understand the discourse structure and semantics, but also provides strong support for deep applications of natural language processing, such as automatic summarization, statistical machine translation, question and answering, etc. At present, the analyses of discourse structure are mainly concentrated on the micro level, while the analyses on macro level are few. Therefore, this paper focuses on the construction of representation schema and corpus resources on the macro level of discourse structure. This paper puts forward a macro discourse structure framework and constructs the logical semantic structure and functional pragmatic structure respectively. On this basis, a macro Chinese discourse structure treebank is annotated, consisting of 147 Newswire articles. Preliminary experimental results show that the representation schema and corpus resource constructed in this paper can lay the foundation for further analysis of macro discourse structure.

[1]  Yue Ming,et al.  Rhetorical Structure Annotation of Chinese News Commentaries , 2008 .

[2]  D. Marcu,et al.  Experiments in Constructing a Corpus of Discourse Trees : Problems , Annotation Choices , Issues , 1999 .

[3]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[4]  Rafael Dueire Lins,et al.  A multi-document summarization system based on statistics and linguistic treatment , 2014, Expert Syst. Appl..

[5]  V. Dijk,et al.  Macrostructures , 2019 .

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Guodong Zhou,et al.  Negation Focus Identification with Contextual Discourse Information , 2014, ACL.

[8]  Preslav Nakov,et al.  Using Discourse Structure Improves Machine Translation Evaluation , 2014, ACL.

[9]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[10]  Fang Kong,et al.  Building Chinese Discourse Corpus with Connective-driven Dependency Tree Structure , 2014, EMNLP.

[11]  Farid Meziane,et al.  A Discourse-Based Approach for Arabic Question Answering , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[12]  Shafiq R. Joty,et al.  Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis , 2013, ACL.

[13]  Houfeng Wang,et al.  A Two-Stage Parsing Method for Text-Level Discourse Analysis , 2017, ACL.

[14]  Nazli Goharian,et al.  Scientific document summarization via citation contextualization and scientific discourse , 2017, International Journal on Digital Libraries.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.