Multi-document summarization using A * search and discriminative training

In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality of the best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  Ahmet Aker,et al.  Generating Image Descriptions Using Dependency Relational Patterns , 2010, ACL.

[3]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[4]  Dilek Z. Hakkani-Tür,et al.  Packing the meeting summarization knapsack , 2008, INTERSPEECH.

[5]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[6]  Ahmet Aker,et al.  Model Summaries for Location-related Images , 2010, LREC.

[7]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[8]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[9]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[10]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[11]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  Ahmet Aker,et al.  Summary Generation for Toponym-referenced Images using Object Type Language Models , 2009, RANLP.

[14]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[15]  Horacio Saggion,et al.  Topic-based Summarization at DUC 2005 , 2005 .

[16]  M. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.