iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network

Sentence extraction is a widely adopted text summarization technique where the most important sentences are extracted from document(s) and presented as a summary. The first step towards sentence extraction is to rank sentences in order of importance as in the summary. This paper proposes a novel graph-based ranking method, iSpreadRank, to perform this task. iSpreadRank models a set of topic-related documents into a sentence similarity network. Based on such a network model, iSpreadRank exploits the spreading activation theory to formulate a general concept from social network analysis: the importance of a node in a network (i.e., a sentence in this paper) is determined not only by the number of nodes to which it connects, but also by the importance of its connected nodes. The algorithm recursively re-weights the importance of sentences by spreading their sentence-specific feature scores throughout the network to adjust the importance of other sentences. Consequently, a ranking of sentences indicating the relative importance of sentences is reasoned. This paper also develops an approach to produce a generic extractive summary according to the inferred sentence ranking. The proposed summarization method is evaluated using the DUC 2004 data set, and found to perform well. Experimental results show that the proposed method obtains a ROUGE-1 score of 0.38068, which represents a slight difference of 0.00156, when compared with the best participant in the DUC 2004 evaluation.

[1]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[2]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[3]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[4]  Eduard Hovy,et al.  NeATS in DUC 2002 , 2002 .

[5]  B. Noble Applied Linear Algebra , 1969 .

[6]  Hsinchun Chen,et al.  Summary in context: Searching versus browsing , 2006, TOIS.

[7]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[8]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Hsinchun Chen,et al.  Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering , 2004, TOIS.

[11]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[12]  Claire Cardie,et al.  Multidocument Summarization via Information Extraction , 2001, HLT.

[13]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[14]  John R. Anderson A Spreading Activation Theory of Memory , 1988 .

[15]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[16]  Udo Hahn,et al.  Text condensation as knowledge base abstraction , 1988, [1988] Proceedings. The Fourth Conference on Artificial Intelligence Applications.

[17]  Jonathan D. Cohen,et al.  Los Angeles, CA, USA , 2002 .

[18]  Manuel J. Maña López,et al.  Multidocument summarization: An added value to clustering in interactive retrieval , 2004, TOIS.

[19]  Günes Erkan Using Biased Random Walks for Focused Summarization , 2006 .

[20]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[22]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[23]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[24]  Kathleen R. McKeown,et al.  SIMFINDER: A Flexible Clustering Tool for Summarization , 2001 .

[25]  Zhu Zhang,et al.  Towards CST-enhanced summarization , 2002, AAAI/IAAI.

[26]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[27]  Marvin Minsky,et al.  Semantic Information Processing , 1968 .

[28]  Johan Bollen,et al.  Mining Associative Relations from Website Logs and their Application to Context-Dependent Retrieval Using Spreading Activation , 1999, WOWS.

[29]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[30]  Luciano Rossoni,et al.  Models and methods in social network analysis , 2006 .

[31]  Eduard H. Hovy,et al.  Automated Text Summarization and the SUMMARIST System , 1998, TIPSTER.

[32]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[33]  Sanda M. Harabagiu,et al.  Multidocument Summarization with GISTexter , 2002, LREC.

[34]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[35]  Sanda M. Harabagiu,et al.  Multi-Document Summarization with GIST EXTER , 2002 .

[36]  Wendy G. Lehnert,et al.  Strategies for Natural Language Processing , 1982 .

[37]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[38]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[39]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[40]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[41]  H. R. Quillian In semantic information processing , 1968 .

[42]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[43]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[44]  Giovanni Guida,et al.  Evaluating Importance: A Step Towards Text Summarization , 1985, IJCAI.

[45]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[46]  Georg Lausen,et al.  Spreading activation models for trust propagation , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[47]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[48]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[49]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[50]  Xiao-Long Wang,et al.  Multi-document summarization based on lexical chains , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[51]  Dragomir R. Radev,et al.  Sub-event based multi-document summarization , 2003, HLT-NAACL 2003.

[52]  Le Sun,et al.  A cue-based hub-authority approach for multi-document text summarization , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[53]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[54]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.