System Combination for Multi-document Summarization

We present a novel framework of system combination for multi-document summarization. For each input set (input), we generate candidate summaries by combining whole sentences from the summaries generated by different systems. We show that the oracle among these candidates is much better than the summaries that we have combined. We then present a supervised model to select among the candidates. The model relies on a rich set of features that capture content importance from different perspectives. Our model performs better than the systems that we combined based on manual and automatic evaluations. We also achieve very competitive performance on six DUC/TAC datasets, comparable to the state-of-the-art on most datasets.

[1]  Fei Liu,et al.  Document Summarization via Guided Sentence Compression , 2013, EMNLP.

[2]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[3]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[4]  Kai Hong,et al.  Improving the Estimation of Word Importance for News Multi-Document Summarization , 2014, EACL.

[5]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[6]  Mark Last,et al.  A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm , 2010, ACL.

[7]  Sergei Nirenburg,et al.  Three Heads are Better than One , 1994, ANLP.

[8]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[11]  Eugene Charniak,et al.  Extractive Multi-Document Summaries Should Explicitly Not Contain Document Specific Content , 2011 .

[12]  Ani Nenkova,et al.  Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization , 2007, ACL.

[13]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[14]  Ani Nenkova,et al.  Automatically Evaluating Content Selection in Summarization without Human Models , 2009, EMNLP.

[15]  Ani Nenkova,et al.  Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.

[16]  Wenpeng Yin,et al.  A Supervised Aggregation Framework for Multi-Document Summarization , 2012, COLING.

[17]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[18]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[19]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[20]  Ani Nenkova,et al.  Detecting Information-Dense Texts in Multiple News Domains , 2014, AAAI.

[21]  A.A. Mohamed,et al.  A text summarizer based on meta-search , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[22]  André F. T. Martins,et al.  Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning , 2013, ACL.

[23]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[24]  Thorsten Joachims,et al.  Large-Margin Learning of Submodular Summarization Models , 2012, EACL.

[25]  Hui Lin,et al.  A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization , 2014, LREC.

[26]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[27]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[28]  A.A. Mohamed,et al.  Consensus Text Summarizer Based on Meta-Search Algorithms , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[29]  Dianne P. O'Leary,et al.  Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.

[30]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[31]  Ming Zhou,et al.  Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization , 2015, AAAI.

[32]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[33]  Giuseppe Riccardi,et al.  Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[34]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[35]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[36]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[37]  John M. Conroy,et al.  A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art , 2013, ACL.

[38]  Houfeng Wang,et al.  Learning Summary Prior Representation for Extractive Summarization , 2015, ACL.

[39]  Ani Nenkova,et al.  Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.

[40]  Dilek Z. Hakkani-Tür,et al.  The ICSI/UTD Summarization System at TAC 2009 , 2009, TAC.

[41]  Eric Brill,et al.  Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999, EMNLP.

[42]  Elena Lloret,et al.  Quantifying the Limits and Success of Extractive Summarization Systems Across Domains , 2010, HLT-NAACL.

[43]  Ahmet Aker,et al.  Multi-Document Summarization Using A* Search and Discriminative Learning , 2010, EMNLP.

[44]  Lin Zhao,et al.  Using External Resources and Joint Learning for Bigram Weighting in ILP-Based Multi-Document Summarization , 2015, NAACL.

[45]  Tao Li,et al.  Weighted consensus multi-document summarization , 2012, Inf. Process. Manag..

[46]  Joshua Goodman,et al.  Multi-Document Summarization by Maximizing Informative Content-Words , 2007, IJCAI.

[47]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[48]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[49]  Hopkins UniversityBaltimore Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .