Predicting Discourse Trees from Transformer-based Neural Summarizers

Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summarizers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.

[1]  Qun Liu,et al.  Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT , 2020, ACL.

[2]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[3]  Rudolf Rosa,et al.  From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions , 2019, BlackboxNLP@ACL.

[4]  Patrick Huber,et al.  MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision , 2020, EMNLP.

[5]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[6]  Liang Wang,et al.  Text-level Discourse Dependency Parsing , 2014, ACL.

[7]  Junyi Jessy Li,et al.  Evaluating Discourse in Structured Text Representations , 2019, ACL.

[8]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[9]  Ming Zhou,et al.  HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.

[10]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[11]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[12]  Barbara Di Eugenio,et al.  An effective Discourse Parser that uses Rich Linguistic Information , 2009, NAACL.

[13]  Shafiq R. Joty,et al.  CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[14]  Daniel Marcu,et al.  Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[15]  Patrick Huber,et al.  Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help ! , 2020, CODI.

[16]  Giuseppe Carenini,et al.  Extractive Summarization of Long Documents by Combining Global and Local Context , 2019, EMNLP.

[17]  Mirella Lapata,et al.  Single Document Summarization as Tree Induction , 2019, NAACL.

[18]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[19]  Giuseppe Carenini,et al.  Predicting Discourse Structure using Distant Supervision from Sentiment , 2019, EMNLP.

[20]  Naoki Kobayashi,et al.  Split or Merge: Which is Better for Unsupervised RST Parsing? , 2019, EMNLP.

[21]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[22]  Hideki Nakayama,et al.  Unsupervised Discourse Constituency Parsing Using Viterbi EM , 2020, TACL.

[23]  Yang Liu,et al.  Learning Structured Text Representations , 2017, TACL.

[24]  Houfeng Wang,et al.  A Two-Stage Parsing Method for Text-Level Discourse Analysis , 2017, ACL.

[25]  Yu Cheng,et al.  Discourse-Aware Neural Extractive Text Summarization , 2020, ACL.

[26]  Dawn Song,et al.  Language Models are Open Knowledge Graphs , 2020, ArXiv.

[27]  Masaaki Nagata,et al.  Single-Document Summarization as a Tree Knapsack Problem , 2013, EMNLP.

[28]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[29]  Masaaki Nagata,et al.  Single Document Summarization based on Nested Tree Structure , 2014, ACL.

[30]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[33]  Amir Zeldes,et al.  The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[34]  Nan Yu,et al.  Transition-based Neural RST Parsing with Implicit Syntax Features , 2018, COLING.

[35]  Mirella Lapata,et al.  Hierarchical Transformers for Multi-Document Summarization , 2019, ACL.