Highlighting Diverse Concepts in Documents

We show the underpinnings of a method for summarizing documents: it ingests a document and automatically highlights a small set of sentences that are expected to cover the different aspects of the document. The sentences are picked using simple coverage and orthogonality criteria. We describe a novel combinatorial formulation that captures exactly the document-summarization problem, and we develop simple and efficient algorithms for solving it. We compare our algorithms with many popular document-summarization techniques via a broad set of experiments on real data. The results demonstrate that our algorithms work well in practice and give high-quality summaries.

[1]  Jade Goldstein-Stewart,et al.  Genre identification and goal-focused summarization , 2007, CIKM '07.

[2]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[3]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[4]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[5]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[6]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[7]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[8]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[9]  Erik D. Demaine,et al.  Combination can be hard: approximability of the unique coverage problem , 2006, SODA '06.

[10]  Bing Liu,et al.  Opinion Extraction and Summarization on the Web , 2006, AAAI.

[11]  Inderjeet Mani,et al.  Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[12]  Wei-Ying Ma,et al.  Learning block importance models for web pages , 2004, WWW '04.

[13]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[16]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[17]  Cécile Paris,et al.  Automatically summarising Web sites: is there a way around it? , 2000, CIKM '00.

[18]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.