Automatic Summarization for Chinese Text Using Affinity Propagation Clustering and Latent Semantic Analysis

As the rapid development of the internet, we can collect more and more information. it also means we need the abitily to search the information which really useful to us from the amount of information quickly. Automatic summarization is useful to us for handling the huge amount of text information in the Web. This paper proposes a Chinese summarization method based on Affinity Propagation(AP)clustering and latent semantic analysis(LSA). AP is a new clustering algorithm raised by B. J. Frey on science in 2007 that takes as input measures of similarity between pairs of data points and simultaneously considers all data points as potential exemplars. LSA is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of sentences. Experiment results show that our method could get more comprehensive and high-quality summarization.

[1]  Dongmei Ai,et al.  Automatic text summarization based on latent semantic indexing , 2010, Artificial Life and Robotics.

[2]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[3]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[4]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[5]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[6]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[7]  Wang Xiao-rong Chinese Automatic Summarization Based on Thematic Sentence Discovery , 2007 .

[8]  Hui Deng,et al.  A Survey on Automatic Summarization , 2010, 2010 International Forum on Information Technology and Applications.

[9]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[10]  George M. Kasper,et al.  The Effects and Limitations of Automated Text Condensing on Reading Comprehension Performance , 1992, Inf. Syst. Res..

[11]  Shasha Xie,et al.  Automatic extractive summarization on meeting corpus , 2010 .

[12]  Meng Wang,et al.  Chinese Automatic Summarization Based on Thematic Sentence Discovery , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[13]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[14]  Marc Mézard,et al.  1993 , 1993, The Winning Cars of the Indianapolis 500.

[15]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[16]  Jimmy J. Lin,et al.  Multi-candidate reduction: Sentence compression as a tool for document summarization tasks , 2007, Inf. Process. Manag..

[17]  A. Adam Whatever happened to information systems ethics? Caught between the devil and the deep blue sea , 2004 .

[18]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[19]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.