论文信息 - Predicting Discourse Trees from Transformer-based Neural Summarizers - 字舞流文

Predicting Discourse Trees from Transformer-based Neural Summarizers

Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summarizers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.

Giuseppe Carenini | Patrick Huber | Wen Xiao | G. Carenini | Patrick Huber | Wen Xiao

[1] Qun Liu,et al. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT , 2020, ACL.

[2] Ryan T. McDonald. A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[3] Rudolf Rosa,et al. From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions , 2019, BlackboxNLP@ACL.

[4] Patrick Huber,et al. MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision , 2020, EMNLP.

[5] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.

[6] Liang Wang,et al. Text-level Discourse Dependency Parsing , 2014, ACL.

[7] Junyi Jessy Li,et al. Evaluating Discourse in Structured Text Representations , 2019, ACL.

[8] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[9] Ming Zhou,et al. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.

[10] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[11] Jason Eisner,et al. Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[12] Barbara Di Eugenio,et al. An effective Discourse Parser that uses Rich Linguistic Information , 2009, NAACL.

[13] Shafiq R. Joty,et al. CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[14] Daniel Marcu,et al. Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[15] Patrick Huber,et al. Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help ! , 2020, CODI.

[16] Giuseppe Carenini,et al. Extractive Summarization of Long Documents by Combining Global and Local Context , 2019, EMNLP.

[17] Mirella Lapata,et al. Single Document Summarization as Tree Induction , 2019, NAACL.

[18] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[19] Giuseppe Carenini,et al. Predicting Discourse Structure using Distant Supervision from Sentiment , 2019, EMNLP.

[20] Naoki Kobayashi,et al. Split or Merge: Which is Better for Unsupervised RST Parsing? , 2019, EMNLP.

[21] William C. Mann,et al. Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[22] Hideki Nakayama,et al. Unsupervised Discourse Constituency Parsing Using Viterbi EM , 2020, TACL.

[23] Yang Liu,et al. Learning Structured Text Representations , 2017, TACL.

[24] Houfeng Wang,et al. A Two-Stage Parsing Method for Text-Level Discourse Analysis , 2017, ACL.

[25] Yu Cheng,et al. Discourse-Aware Neural Extractive Text Summarization , 2020, ACL.

[26] Dawn Song,et al. Language Models are Open Knowledge Graphs , 2020, ArXiv.

[27] Masaaki Nagata,et al. Single-Document Summarization as a Tree Knapsack Problem , 2013, EMNLP.

[28] 悠太菊池,et al. 大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[29] Masaaki Nagata,et al. Single Document Summarization based on Nested Tree Structure , 2014, ACL.

[30] Franck Dernoncourt,et al. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[31] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[32] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[33] Amir Zeldes,et al. The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[34] Nan Yu,et al. Transition-based Neural RST Parsing with Implicit Syntax Features , 2018, COLING.

[35] Mirella Lapata,et al. Hierarchical Transformers for Multi-Document Summarization , 2019, ACL.