QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

Meetings are a key component of human collaboration. As increasing numbers of meetings are recorded and transcribed, meeting summaries have become essential to remind those who may or may not have attended the meetings about the key decisions made and the tasks to be completed. However, it is hard to create a single short summary that covers all the content of a long meeting involving multiple people and topics. In order to satisfy the needs of different types of users, we define a new query-based multi-domain meeting summarization task, where models have to select and summarize relevant spans of meetings in response to a query, and we introduce QMSum, a new benchmark for this task. QMSum consists of 1,808 query-summary pairs over 232 meetings in multiple domains. Besides, we investigate a locate-then-summarize method and evaluate a set of strong summarization baselines on the task. Experimental results and manual analysis reveal that QMSum presents significant challenges in long meeting summarization for future research. Dataset is available at https://github.com/Yale-LILY/QMSum.

[1]  Mor Naaman,et al.  Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.

[2]  Pengfei Liu,et al.  Heterogeneous Graph Neural Networks for Extractive Document Summarization , 2020, ACL.

[3]  Michael Elhadad,et al.  Query Focused Abstractive Summarization: Incorporating Query Relevance, Multi-Document Coverage, and Summary Length Constraints into seq2seq Models , 2018, ArXiv.

[4]  Hoa Trang Dang,et al.  DUC 2005: Evaluation of Question-Focused Summarization Systems , 2006 .

[5]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[6]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[7]  Yen-Chun Chen,et al.  Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.

[8]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[9]  Jean-Pierre Lorré,et al.  Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization , 2018, ACL.

[10]  Giuseppe Carenini,et al.  A Template-based Abstractive Meeting Summarization: Leveraging Summary and Source Text Relationships , 2014, INLG.

[11]  Joseph E. Mroz,et al.  Do We Really Need Another Meeting? The Science of Workplace Meetings , 2018, Current Directions in Psychological Science.

[12]  Florian Metze,et al.  Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization , 2012, INTERSPEECH.

[13]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[14]  Furu Wei,et al.  Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.

[15]  Jiacheng Xu,et al.  Neural Extractive Text Summarization with Syntactic Compression , 2019, EMNLP.

[16]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[17]  Xuanjing Huang,et al.  Searching for Effective Neural Extractive Summarization: What Works and What’s Next , 2019, ACL.

[18]  Balaraman Ravindran,et al.  Diversity driven attention model for query-based abstractive summarization , 2017, ACL.

[19]  Kathleen McKeown,et al.  Content Selection in Deep Learning Models of Summarization , 2018, EMNLP.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Fei Sha,et al.  AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization , 2020, ArXiv.

[22]  Claire Cardie,et al.  Domain-Independent Abstract Generation for Focused Meeting Summarization , 2013, ACL.

[23]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[24]  Giuseppe Carenini,et al.  Abstractive Meeting Summarization with Entailment and Fusion , 2013, ENLG.

[25]  Xuanjing Huang,et al.  A Closer Look at Data Bias in Neural Extractive Summarization Models , 2019, EMNLP.

[26]  Franck Dernoncourt,et al.  Scoring Sentence Singletons and Pairs for Abstractive Summarization , 2019, ACL.

[27]  Xuanjing Huang,et al.  Enhancing Scientific Papers Summarization with Citation Graph , 2021, AAAI.

[28]  Xiangji Huang,et al.  Query Focused Abstractive Summarization via Incorporating Query Relevance and Transfer Learning with Transformer Models , 2020, Canadian Conference on AI.

[29]  Zhe Gan,et al.  Discourse-Aware Neural Extractive Model for Text Summarization , 2019, ArXiv.

[30]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[31]  Heng Ji,et al.  Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization , 2019, ACL.

[32]  Claire Cardie,et al.  A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization , 2013, ACL.

[33]  Xuanjing Huang,et al.  Exploring Domain Shift in Extractive Text Summarization , 2019, ArXiv.

[34]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[35]  Pengfei Liu,et al.  Extractive Summarization as Text Matching , 2020, ACL.

[36]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[37]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[38]  Chenguang Zhu,et al.  A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining , 2020, EMNLP.

[39]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[40]  Marina Litvak,et al.  Query-based summarization using MDL principle , 2017, MultiLing@EACL.

[41]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[42]  Hassan Foroosh,et al.  Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization , 2019, ACL.

[43]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[44]  Yejin Choi,et al.  Deep Communicating Agents for Abstractive Summarization , 2018, NAACL.

[45]  Ruipeng Jia,et al.  Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network , 2020, EMNLP.

[46]  Jungo Kasai,et al.  ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks , 2019, AAAI.

[47]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[48]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[49]  Fei Liu,et al.  How Domain Terminology Affects Meeting Summarization Performance , 2020, COLING.

[50]  Li Dong,et al.  Transforming Wikipedia into Augmented Data for Query-Focused Summarization , 2019, ArXiv.

[51]  Manabu Okumura,et al.  Neural Query-Biased Abstractive Summarization Using Copying Mechanism , 2020, ECIR.