Towards Understanding Omission in Dialogue Summarization

Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarization, but few of them have further explored the omission problem, such as how omission affects summarization results and how to detect omission, which is critical for reducing omission and improving summarization quality. Moreover, analyzing and detecting omission relies on summarization datasets with omission labels (i.e., which dialogue utterances are omitted in the summarization), which are not available in the current literature. In this paper, we propose the OLDS dataset, which provides high-quality omission labels for dialogue summarization. By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization. Therefore, we formulate an omission detection task and demonstrate our proposed dataset can support the training and evaluation of this task well. We also call for research action on omission detection based on our proposed datasets. Our dataset and codes are publicly available.

[1]  Dragomir R. Radev,et al.  CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning , 2021, NAACL.

[2]  David Konopnicki,et al.  TWEETSUMM - A Dialog Summarization Dataset for Customer Service , 2021, EMNLP.

[3]  Zhengyuan Liu,et al.  Controllable Neural Dialogue Summarization with Personal Named Entity Planning , 2021, EMNLP.

[4]  Qi Zhang,et al.  Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining , 2021, EMNLP.

[5]  Shiyue Zhang,et al.  EmailSum: Abstractive Email Thread Summarization , 2021, ACL.

[6]  Bing Qin,et al.  A Survey on Dialogue Summarization: Recent Advances and New Frontiers , 2021, IJCAI.

[7]  Zhengyuan Liu,et al.  Coreference-Aware Dialogue Summarization , 2021, SIGDIAL.

[8]  Dragomir R. Radev,et al.  QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization , 2021, NAACL.

[9]  Xuanjing Huang,et al.  Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders , 2020, AAAI.

[10]  Xuanjing Huang,et al.  Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling , 2020, AAAI.

[11]  Fei Xia,et al.  Summarizing Medical Conversations via Identifying Important Utterances , 2020, COLING.

[12]  Diyi Yang,et al.  Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization , 2020, EMNLP.

[13]  Xavier Amatriain,et al.  Dr. Summarize: Global Summarization of Medical Dialogue by Exploiting Local Structures. , 2020, FINDINGS.

[14]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[15]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[16]  Gokhan Tur,et al.  Joint Contextual Modeling for ASR Correction and Language Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[18]  Aleksander Wawer,et al.  SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.

[19]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[20]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[21]  Richard Socher,et al.  Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.

[22]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[23]  Xuanjing Huang,et al.  Searching for Effective Neural Extractive Summarization: What Works and What’s Next , 2019, ACL.

[24]  Maosong Sun,et al.  Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach , 2019, ACL.

[25]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[26]  Yen-Chun Chen,et al.  Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[29]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[30]  Ian S. Dunn,et al.  Exploring the Limits , 2009 .

[31]  Giuseppe Carenini,et al.  Summarizing Spoken and Written Conversations , 2008, EMNLP.

[32]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[33]  Owen Rambow,et al.  Summarizing Email Threads , 2004, NAACL.

[34]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[35]  Lino Marques,et al.  Source , 2000, BMJ : British Medical Journal.

[36]  Yang Liu,et al.  DialogSum: A Real-Life Scenario Dialogue Summarization Dataset , 2021, FINDINGS.

[37]  Oriol Vinyals,et al.  Order Matters: Sequence to sequence for sets , 2016, ICLR 2016.

[38]  Graham Russell,et al.  Errors of omission in translation , 1999, TMI.

[39]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[40]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .