Summarization Through Submodularity and Dispersion

We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.

[1]  Dragomir R. Radev,et al.  Citation Summarization Through Keyphrase Extraction , 2010, COLING.

[2]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[3]  Hiroya Takamura,et al.  Text Summarization Model based on Maximum Coverage Problem and its Variant , 2008 .

[4]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[5]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[6]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.

[7]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[8]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[9]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[10]  Tao Li,et al.  Multi-Document Summarization via the Minimum Dominating Set , 2010, COLING.

[11]  ChengXiang Zhai,et al.  Comprehensive Review of Opinion Summarization , 2011 .

[12]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[13]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[14]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[15]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[16]  Dietmar Cieslik The Steiner Ratio , 2001 .

[17]  Koji Yatani,et al.  Review spotlight: a user interface for summarizing user-generated reviews using adjective-noun word pairs , 2011, CHI.

[18]  Satoshi Sekine,et al.  A survey for Multi-Document Summarization , 2003, HLT-NAACL 2003.

[19]  Dilek Z. Hakkani-Tür,et al.  Long story short - Global unsupervised models for keyphrase based meeting summarization , 2010, Speech Commun..

[20]  Makoto Imase,et al.  Dynamic Steiner Tree Problem , 1991, SIAM J. Discret. Math..

[21]  Barun Chandra,et al.  Facility Dispersion and Remote Subgraphs , 1995, SWAT.

[22]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[23]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[24]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[25]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.