Single Document Summarization with Document Expansion

Existing methods for single document summarization usually make use of only the information contained in the specified document. This paper proposes the technique of document expansion to provide more knowledge to help single document summarization. A specified document is expanded to a small document set by adding a few neighbor documents close to the document, and then the graph-ranking based algorithm is applied on the expanded document set for extracting sentences from the single document, by making use of both the within-document relationships between sentences of the specified document and the cross-document relationships between sentences of all documents in the document set. The experimental results on the DUC2002 dataset demonstrate the effectiveness of the proposed approach based on document expansion. The cross-document relationships between sentences in the expanded document set are validated to be very important for single document summarization.

[1]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[2]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[3]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[6]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[7]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[8]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[9]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[10]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[11]  Massih-Reza Amini,et al.  The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.

[12]  Yuji Matsumoto,et al.  A new approach to unsupervised text summarization , 2001, SIGIR '01.

[13]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[14]  Eduard H. Hovy,et al.  Automated Text Summarization and the SUMMARIST System , 1998, TIPSTER.

[15]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[16]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[17]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[18]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[19]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[20]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.