论文信息 - A Multi-Document Multi-Lingual Automatic Summarization System

A Multi-Document Multi-Lingual Automatic Summarization System

Abstract. In this paper, a new multidocument multi-lingual text summarization technique, based on singular value decomposition and hierarchical clustering, is proposed. The proposed approach relies on only two resources for any language: a word segmentation system and a dictionary of words along with their document frequencies. The summarizer initially takes a collection of related documents, and transforms them into a matrix; it then applies singular value decomposition to the resulted matrix. After using a binary hierarchical clustering algorithm, the most important sentences of the most important clusters form the summary. The appropriate place of each chosen sentence is determined by a novel technique. The system has been successfully tested on summarizIn this paper, a new multidocument multi-lingual text summarization technique, based on singular value decomposition and hierarchical clustering, is proposed. The proposed approach relies on only two resources for any language: a word segmentation system and a dictionary of words along with their document frequencies. The summarizer initially takes a collection of related documents, and transforms them into a matrix; it then applies singular value decomposition to the resulted matrix. After using a binary hierarchical clustering algorithm, the most important sentences of the most important clusters form the summary. The appropriate place of each chosen sentence is determined by a novel technique. The system has been successfully tested on summarizing several Persian document collections.

Gholamreza Ghassem-Sani | Seyed Abolghasem Mirroshandel | Mohamad Ali Honarpisheh

[1] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .

[2] Xin Liu,et al. Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[3] David Evans,et al. Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[4] H. P. Edmundson,et al. Automatic abstracting and indexing—survey and recommendations , 1961, CACM.

[5] Kathleen McKeown,et al. Cut and Paste Based Text Summarization , 2000, ANLP.

[6] Andrew Hickl,et al. LCC's GISTexter at DUC 2006: Multi-Strategy Multi-Document Summarization , 2006 .

[7] Mark T. Maybury,et al. Advances in Automatic Text Summarization , 1999 .

[8] Daniel Boley,et al. Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[9] Daniel Marcu,et al. Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[10] Chin-Yew Lin,et al. From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[11] Eduard H. Hovy,et al. From Single to Multi-document Summarization , 2002, ACL.