TODSum: Task-Oriented Dialogue Summarization with State Tracking

Previous dialogue summarization datasets mainly focus on open-domain chitchat dialogues, while summarization datasets for the broadly used task-oriented dialogue haven’t been explored yet. Automatically summarizing such taskoriented dialogues can help a business collect and review needs to improve the service. Besides, previous datasets pay more attention to generate good summaries with higher ROUGE scores, but they hardly understand the structured information of dialogues and ignore the factuality of summaries. In this paper, we introduce a large-scale public TaskOriented Dialogue Summarization dataset, TODSum, which aims to summarize the key points of the agent completing certain tasks with the user. Compared to existing work, TODSum suffers from severe scattered information issues and requires strict factual consistency, which makes it hard to directly apply recent dialogue summarization models. Therefore, we introduce additional dialogue state knowledge for TODSum to enhance the faithfulness of generated summaries. We hope a better understanding of conversational content helps summarization models generate concise and coherent summaries. Meanwhile, we establish a comprehensive benchmark for TODSum and propose a state-aware structured dialogue summarization model to integrate dialogue state information and dialogue history. Exhaustive experiments and qualitative analysis prove the effectiveness of dialogue structure guidance. Finally, we discuss the current issues of TODSum and potential development directions for future work.

[1]  Aleksander Wawer,et al.  SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.

[2]  Diyi Yang,et al.  Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs , 2021, NAACL.

[3]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[4]  Christopher D. Manning,et al.  Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports , 2020, ACL.

[5]  Ben Goodrich,et al.  Assessing The Factual Accuracy of Generated Text , 2019, KDD.

[6]  Jackie Chi Kit Cheung,et al.  Multi-Fact Correction in Abstractive Text Summarization , 2020, EMNLP.

[7]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[8]  Jieping Ye,et al.  Automatic Dialogue Summary Generation for Customer Service , 2019, KDD.

[9]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[10]  Xuanjing Huang,et al.  Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling , 2020, AAAI.

[11]  Alex Wang,et al.  Asking and Answering Questions to Evaluate the Factual Consistency of Summaries , 2020, ACL.

[12]  Xuanjing Huang,et al.  Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders , 2020, AAAI.

[13]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[14]  Dragomir R. Radev,et al.  ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining , 2021, ACL.

[15]  Haoran Li,et al.  Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization , 2018, COLING.

[16]  J. C. Cheung,et al.  Factual Error Correction for Abstractive Summarization Models , 2020, EMNLP.

[17]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[18]  Tanmoy Chakraborty,et al.  CQASUMM: Building References for Community Question Answering Summarization Corpora , 2018, COMAD/CODS.

[19]  Kathleen McKeown,et al.  Content Selection in Deep Learning Models of Summarization , 2018, EMNLP.

[20]  Yuanmeng Yan,et al.  Learning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge , 2020, ACL.

[21]  Jianfeng Gao,et al.  Challenges in Building Intelligent Open-domain Dialog Systems , 2019, ACM Trans. Inf. Syst..

[22]  Qingkai Zeng,et al.  Enhancing Factual Consistency of Abstractive Summarization , 2021, NAACL.

[23]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[24]  Dragomir R. Radev,et al.  QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization , 2021, NAACL.

[25]  Zhou Yu,et al.  Abstractive Dialog Summarization with Semantic Scaffolds , 2019, ArXiv.

[26]  Yuanmeng Yan,et al.  Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots , 2020, EMNLP.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Mirella Lapata,et al.  Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[29]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[30]  Ido Dagan,et al.  Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference , 2019, ACL.

[31]  Diyi Yang,et al.  Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization , 2020, EMNLP.

[32]  Lu Wang,et al.  BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization , 2019, ACL.

[33]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[34]  Jianfeng Gao,et al.  Few-shot Natural Language Generation for Task-Oriented Dialog , 2020, FINDINGS.

[35]  Mona T. Diab,et al.  FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.

[36]  Pengfei Liu,et al.  GSum: A General Framework for Guided Neural Abstractive Summarization , 2021, NAACL.

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Kam-Fai Wong,et al.  Dialogue State Tracking with Pretrained Encoder for Multi-domain Trask-oriented Dialogue Systems , 2020, ArXiv.

[39]  Zheng Zhang,et al.  Recent advances and challenges in task-oriented dialog systems , 2020, Science China Technological Sciences.

[40]  Chenguang Zhu,et al.  MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization , 2021, NAACL.

[41]  Furu Wei,et al.  Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.

[42]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[43]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[44]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[45]  Yue Zhang,et al.  DialogSum: A Real-Life Scenario Dialogue Summarization Dataset , 2021, FINDINGS.