Query-based summarization using MDL principle

Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005 2006). It competes with the best results.

[1]  Ani Nenkova,et al.  References to Named Entities: a Corpus Study , 2003, HLT-NAACL.

[2]  Jennifer Williams,et al.  Finding Good Enough: A Task-Based Evaluation of Query Biased Summarization for Cross-Language Information Retrieval , 2014, EMNLP.

[3]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[4]  Frank Schilder,et al.  FastSum: Fast and Accurate Query-based Multi-document Summarization , 2008, ACL.

[5]  Tadashi Nomoto,et al.  Machine Learning Approaches to Rhetorical Parsing and Open-Domain Text Summarization , 2004 .

[6]  Vasudeva Varma,et al.  Query Independent Sentence Scoring approach to DUC 2006 , 2006 .

[7]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[8]  Vasudeva Varma,et al.  A Relevance-Based Language Modeling approach to DUC 2005 , 2005 .

[9]  Wauter Bosma Query-Based Summarization using Rhetorical Structure Theory , 2004, CLIN.

[10]  Sanguthevar Rajasekaran,et al.  Query-Based Summarization Based on Document Graphs , 2006 .

[11]  Panayiotis Tsaparas,et al.  Review Synthesis for Micro-Review Summarization , 2015, WSDM.

[12]  Yan Liu,et al.  Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning , 2012, AAAI.

[13]  Sun Park,et al.  Query Based Summarization Using Non-negative Matrix Factorization , 2006, KES.

[14]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[15]  Mark Last,et al.  Krimping texts for better summarization , 2015, EMNLP.

[16]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[17]  Yuji Matsumoto,et al.  A new approach to unsupervised text summarization , 2001, SIGIR '01.

[18]  Tat-Seng Chua,et al.  NUS at DUC 2005: Understanding Documents via Concept Links , 2005 .

[19]  Ping Chen,et al.  A Query-Based Medical Information Summarization System Using Ontology Knowledge , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[20]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[21]  Sasha Blair-Goldensohn From Definitions to Complex Topics: Columbia University at DUC 2005 , 2005 .

[22]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[23]  Claire Cardie,et al.  Query-Focused Opinion Summarization for User-Generated Content , 2014, COLING.

[24]  Lin Zhao,et al.  Using External Resources and Joint Learning for Bigram Weighting in ILP-Based Multi-Document Summarization , 2015, NAACL.

[25]  Laks V. S. Lakshmanan,et al.  MDL Summarization with Holes , 2005, VLDB.

[26]  Liang Zhou,et al.  Summarizing Answers for Complicated Questions , 2006, LREC.

[27]  Sasha Blair-Goldensohn,et al.  Answering Definitional Questions: A Hybrid Approach , 2004, New Directions in Question Answering.

[28]  Laks V. S. Lakshmanan,et al.  The Generalized MDL Approach for Summarization , 2002, VLDB.