The two-stage unsupervised approach to multidocument summarization

This paper suggests an approach for creating a summary for a set of documents with revealing the topics and extracting informative sentences. The topics are determined through clustering of sentences, and the informative sentences are extracted using the ranking algorithm. The result of the summarization has been shown depends on the clustering method, the ranking algorithm, and the similarity measure. The experiments on an open benchmark datasets DUC2001 and DUC2002 have showed that the suggested clustering methods and the ranking algorithm show better results than the known k-means method and the ranking algorithms PageRank and HITS.

[1]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[2]  Jimmy J. Lin,et al.  Multi-candidate reduction: Sentence compression as a tool for document summarization tasks , 2007, Inf. Process. Manag..

[3]  Marie-Francine Moens,et al.  Generic technologies for single- and multi-document summarization , 2005, Inf. Process. Manag..

[4]  Jin Zhang,et al.  AdaSum: an adaptive model for summarization , 2008, CIKM '08.

[5]  Ramiz M. Aliguliyev,et al.  CLUSTERING TECHNIQUES AND DISCRETE PARTICLE SWARM OPTIMIZATION ALGORITHM FOR MULTI‐DOCUMENT SUMMARIZATION , 2010, Comput. Intell..

[6]  Jin Zhang,et al.  GSPSummary: A Graph-Based Sub-topic Partition Algorithm for Summarization , 2008, AIRS.

[7]  Marco Gori,et al.  A unified probabilistic framework for Web page scoring systems , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Evangelos E. Milios,et al.  World Wide Web site summarization , 2004, Web Intell. Agent Syst..

[9]  Wei-Pang Yang,et al.  iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network , 2008, Expert Syst. Appl..

[10]  Jaideep Srivastava,et al.  WICER: a weighted inter-cluster edge ranking for clustered graphs , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[11]  Sanda M. Harabagiu,et al.  Satisfying information needs with multi-document summaries , 2007, Inf. Process. Manag..

[12]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[13]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Jiulong Shan,et al.  A new web page summarization method , 2006, SIGIR '06.

[16]  Xiaolei Wang,et al.  Personalized PageRank Based Multi-document Summarization , 2008, IEEE International Workshop on Semantic Computing and Systems.

[17]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[18]  Joydeep Ghosh,et al.  Value-based customer grouping from large retail data sets , 2000, SPIE Defense + Commercial Sensing.

[19]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[20]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.