Enhanced graph based approach for multi document summarization

Summarizing documents catering the needs of an user is tricky and challenging. Though there are varieties of approaches, graphical methods have been quite popularly investigated for summarizing document contents. This paper focus its attention on two graphical methods namely(LexRank (threshold) and LexRank (Continuous) proposed by Erkan and Radev. This paper proposes two enhancements to the above work investigated earlier by adding two more features to the existing one. Firstly, discounting approach was introduced to form a summary which ensures less redundancy among sentences. Secondly, position weight mechanism has been adopted to preserve importance based on the position they occupy. Intrinsic evaluation has been done with two data sets. Data set 1 has been created manually from the news paper documents collected by us for experiments. Data set 2 is from DUC 2002 data which is commercially available and distributed or accessed through National Institute of Standards Technology (NIST). We have shown that the based upon precision and recall parameters were comprehensively better as compared to the earlier algorithms.

[1]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[2]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[5]  Christophe G. Giraud-Carrier,et al.  Applications of data mining in software engineering , 2010, Int. J. Data Anal. Tech. Strateg..

[6]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  Waqas Anwar,et al.  A hybrid approach for urdu sentence boundary disambiguation , 2012, Int. Arab J. Inf. Technol..

[9]  Fumiyo Fukumoto,et al.  Multi-document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences , 2011, CICLing.

[10]  Yaquan Xu,et al.  A new feature selection method based on support vector machines for text categorisation , 2011, Int. J. Data Anal. Tech. Strateg..

[11]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[12]  Dragomir R. Radev,et al.  Summarization evaluation using relative utility , 2003, CIKM '03.

[13]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[14]  Shanmugasundaram Hariharan,et al.  Studies on intrinsic summary evaluation , 2010, Int. J. Artif. Intell. Soft Comput..

[15]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[16]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[17]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[18]  Wei-Pang Yang,et al.  iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network , 2008, Expert Syst. Appl..

[19]  Shanmugasundaram Hariharan,et al.  A Comparison of Similarity Measures for Text Documents , 2008, J. Inf. Knowl. Manag..

[20]  Pavel Brazdil,et al.  TEXT SUMMARIZATION: USING CENTRALITY IN THE PATHFINDER NETWORK , 2007 .

[21]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[22]  Xiaolei Wang,et al.  Personalized PageRank Based Multi-document Summarization , 2008, IEEE International Workshop on Semantic Computing and Systems.

[23]  Zhenmao Chen,et al.  Automatic text summarizing based on sentence extraction: A statistical approach , 2002 .

[24]  Xiaojun Wan TimedTextRank: adding the temporal dimension to multi-document summarization , 2007, SIGIR.

[25]  Qin Lu,et al.  Extractive Summarization using Inter- and Intra- Event Relevance , 2006, ACL.

[26]  Jonas Sjöbergh,et al.  Older versions of the ROUGEeval summarization evaluation system were easier to fool , 2007, Inf. Process. Manag..

[27]  Ronald K. Klimberg,et al.  Applications of Data Mining , 2007 .

[28]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[29]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[30]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[31]  Xiaojun Wan,et al.  An Exploration of Document Impact on Graph-Based Multi-Document Summarization , 2008, EMNLP.