Hierarchical Macro Discourse Parsing Based on Topic Segmentation

Hierarchically constructing micro (i.e., intra-sentence or inter-sentence) discourse structure trees using explicit boundaries (e.g., sentence and paragraph boundaries) has been proved to be an effective strategy. However, it is difficult to apply this strategy to document-level macro (i.e., interparagraph) discourse parsing, the more challenging task, due to the lack of explicit boundaries at the higher level. To alleviate this issue, we introduce a topic segmentation mechanism to detect implicit topic boundaries and then help the document-level macro discourse parser to construct better discourse trees hierarchically. In particular, our parser first splits a document into several sections using the topic boundaries that the topic segmentation detects. Then it builds a smaller and more accurate discourse sub-tree in each section and sequentially forms a whole tree for a document. The experimental results on both Chinese MCDTB and English RST-DT show that our proposed method outperforms the state-of-the-art baselines significantly.

[1]  Yi Zhou,et al.  Constructing Chinese Macro Discourse Tree via Multiple Views and Word Pair Similarity , 2019, NLPCC.

[2]  Guodong Zhou,et al.  Recognizing Macro Chinese Discourse Structure on Label Degeneracy Combination Model , 2018, NLPCC.

[3]  Shafiq R. Joty,et al.  CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[4]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[5]  Alexander A. Alemi,et al.  Text Segmentation based on Semantic Word Embeddings , 2015, ArXiv.

[6]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[7]  Douglas W. Oard,et al.  A Joint Model for Document Segmentation and Segment Labeling , 2020, ACL.

[8]  Houfeng Wang,et al.  Learning to Rank Semantic Coherence for Topic Segmentation , 2017, EMNLP.

[9]  Chris Fournier,et al.  Evaluating Text Segmentation using Boundary Edit Distance , 2013, ACL.

[10]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[11]  Xueqi Cheng,et al.  Outline Generation: Understanding the Inherent Content Structure of Documents , 2019, SIGIR.

[12]  Shafiq R. Joty,et al.  Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis , 2013, ACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Nicholas Asher,et al.  How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT , 2017, EMNLP.

[15]  Barbara Di Eugenio,et al.  An effective Discourse Parser that uses Rich Linguistic Information , 2009, NAACL.

[16]  Shafiq R. Joty,et al.  A Unified Linear-Time Framework for Sentence-Level Discourse Parsing , 2019, ACL.

[17]  Diana Inkpen,et al.  Segmentation Similarity and Agreement , 2012, NAACL.

[18]  Manfred Stede,et al.  Discourse Processing , 2011, NAACL.

[19]  Fang Kong,et al.  Building Chinese Discourse Corpus with Connective-driven Dependency Tree Structure , 2014, EMNLP.

[20]  Peifeng Li,et al.  Joint Modeling of Recognizing Macro Chinese Discourse Nuclearity and Relation Based on Structure and Topic Gated Semantic Network , 2019, NLPCC.

[21]  Guodong Zhou,et al.  Joint Modeling of Structure Identification and Nuclearity Recognition in Macro Chinese Discourse Treebank , 2018, COLING.

[22]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[23]  Naoki Kobayashi,et al.  Top-Down RST Parsing Utilizing Granularity Levels in Documents , 2020, AAAI.

[24]  Lidong Bing,et al.  Hierarchical Pointer Net Parsing , 2019, EMNLP/IJCNLP.

[25]  Alex Lascarides,et al.  Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure , 2004, COLING.

[26]  Guodong Zhou,et al.  MCDTB: A Macro-level Chinese Discourse TreeBank , 2018, COLING.

[27]  Naoki Kobayashi,et al.  Split or Merge: Which is Better for Unsupervised RST Parsing? , 2019, EMNLP.

[28]  Chris Biemann,et al.  TopicTiling: A Text Segmentation Algorithm based on LDA , 2012, ACL 2012.

[29]  Todor Mihaylov,et al.  Discourse-Aware Semantic Self-Attention for Narrative Reading Comprehension , 2019, EMNLP.

[30]  Farid Meziane,et al.  A Discourse-Based Approach for Arabic Question Answering , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[31]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[32]  Swapna Somasundaran,et al.  Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation , 2020, AAAI.

[33]  Amir Zeldes,et al.  The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[34]  Nazli Goharian,et al.  Scientific document summarization via citation contextualization and scientific discourse , 2017, International Journal on Digital Libraries.

[35]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[36]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[37]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[38]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[39]  Yuping Zhou,et al.  The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations , 2015, Lang. Resour. Evaluation.

[40]  Peifeng Li,et al.  A Top-down Neural Architecture towards Text-level Parsing of Discourse Rhetorical Structure , 2020, ACL.

[41]  Jing Li,et al.  SegBot: A Generic Neural Text Segmentation Model with Pointer Network , 2018, IJCAI.

[42]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[43]  Houfeng Wang,et al.  A Two-Stage Parsing Method for Text-Level Discourse Analysis , 2017, ACL.