Multi-document Summarization via Budgeted Maximization of Submodular Functions

We treat the text summarization problem as maximizing a submodular function under a budget constraint. We show, both theoretically and empirically, a modified greedy algorithm can efficiently solve the budgeted submodular maximization problem near-optimally, and we derive new approximation bounds in doing so. Experiments on DUC'04 task show that our approach is superior to the best-performing method from the DUC'04 evaluation on ROUGE-1 scores.

[1]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[2]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[3]  Vahab Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2007, FOCS 2007.

[4]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[5]  Alexander Schrijver,et al.  A Combinatorial Algorithm Minimizing Submodular Functions in Strongly Polynomial Time , 2000, J. Comb. Theory B.

[6]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[7]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[8]  Maxim Sviridenko,et al.  A note on maximizing a submodular set function subject to a knapsack constraint , 2004, Oper. Res. Lett..

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Jeff A. Bilmes,et al.  Submodularity Cuts and Applications , 2009, NIPS.

[11]  Satoru Iwata,et al.  A combinatorial strongly polynomial algorithm for minimizing submodular functions , 2001, JACM.

[12]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[13]  John M. Conroy Left-Brain/Right-Brain Multi-Document Summarization , 2004 .

[14]  Carlos Guestrin,et al.  A Note on the Budgeted Maximization of Submodular Functions , 2005 .

[15]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[16]  Vahab S. Mirrokni,et al.  Non-monotone submodular maximization under matroid and knapsack constraints , 2009, STOC '09.

[17]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[18]  Vahab S. Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[19]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[20]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[21]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[22]  Gagan Goel,et al.  Approximability of Combinatorial Problems with Multi-agent Submodular Cost Functions , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[23]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[24]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..