A Determinantal Point Process Based Novel Sampling Method of Abstractive Text Summarization

In recent years abstractive text summarization (ATS) research has made considerable progress attributed to two key improvements, deep neural modeling and likelihood estimation based sampling, in the end-to-end optimization training. While modeling has grounded on a few de facto highly capable base models within encoder-decoder architecture, novel sampling ideas, such as random masking classification and generative prediction by unsupervised learning, have also been explored. They aim at improving prior knowledge, particularly of language modeling for downstream tasks. It has led to the notable performance gain of ATS. But several challenges remain, for example, undesirable word repeats. In this paper, we propose a determinantal point process (DPP) based novel sampling method to address the issue. It can be easily integrated with the existing ATS models. Our experiments and subsequent analysis have revealed that the adopted models trained by our sampling method reduce undesirable word repeats and improve word coverage while achieving competitive ROUGE scores.

[1]  Yaohui Jin,et al.  FusionSum: Abstractive summarization with sentence fusion and cooperative reinforcement learning , 2022, Knowl. Based Syst..

[2]  Tom J kuriakose,et al.  Automatic Text Summarization Using Deep Learning and Reinforcement Learning , 2021, Advances in Intelligent Systems and Computing.

[3]  Mirella Lapata,et al.  Multi-Document Summarization with Determinantal Point Process Attention , 2021, J. Artif. Intell. Res..

[4]  Yixin Liu,et al.  SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization , 2021, ACL.

[5]  Armen Aghajanyan,et al.  Better Fine-Tuning by Reducing Representational Collapse , 2020, ICLR.

[6]  Pengfei Liu,et al.  Extractive Summarization as Text Matching , 2020, ACL.

[7]  Ming Zhou,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.

[8]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[9]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[10]  Hassan Foroosh,et al.  Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations , 2019, EMNLP.

[11]  N. Vanetik,et al.  In Conclusion Not Repetition: Comprehensive Abstractive Summarization with Diversified Attention Based on Determinantal Point Processes , 2019, CoNLL.

[12]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[13]  Xuanjing Huang,et al.  Searching for Effective Neural Extractive Summarization: What Works and What’s Next , 2019, ACL.

[14]  Marc Brockschmidt,et al.  Structured Neural Summarization , 2018, ICLR.

[15]  Lin Zhao,et al.  Structure-Infused Copy Mechanisms for Abstractive Summarization , 2018, COLING.

[16]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[17]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[18]  Piji Li,et al.  Deep Recurrent Generative Decoder for Abstractive Text Summarization , 2017, EMNLP.

[19]  Xiaojun Wan,et al.  Abstractive Document Summarization with a Graph-Based Attentional Neural Model , 2017, ACL.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[22]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[24]  Alexander M. Rush,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[25]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[26]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[27]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[28]  E. Rains,et al.  Eynard–Mehta Theorem, Schur Process, and their Pfaffian Analogs , 2004, math-ph/0409059.

[29]  A. C. Koivunen,et al.  The Feasibility of Data Whitening to Improve Performance of Weather Radar , 1999 .

[30]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[32]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[33]  Michael I. Jordan,et al.  Decision-Making with Auto-Encoding Variational Bayes , 2020, NeurIPS.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.