论文信息 - Single Document Summarization with Document Expansion

Single Document Summarization with Document Expansion

Existing methods for single document summarization usually make use of only the information contained in the specified document. This paper proposes the technique of document expansion to provide more knowledge to help single document summarization. A specified document is expanded to a small document set by adding a few neighbor documents close to the document, and then the graph-ranking based algorithm is applied on the expanded document set for extracting sentences from the single document, by making use of both the within-document relationships between sentences of the specified document and the cross-document relationships between sentences of all documents in the document set. The experimental results on the DUC2002 dataset demonstrate the effectiveness of the proposed approach based on document expansion. The cross-document relationships between sentences in the expanded document set are validated to be very important for single document summarization.

Xiaojun Wan | Jianwu Yang

[1] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[2] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[3] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[4] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[6] Hua Li,et al. Improving web search results using affinity graph , 2005, SIGIR '05.

[7] Dianne P. O'Leary,et al. Text summarization via hidden Markov models , 2001, SIGIR '01.

[8] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[9] Rada Mihalcea,et al. A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[10] Xin Liu,et al. Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[11] Massih-Reza Amini,et al. The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.