A Knowledge Induced Graph-Theoretical Model for Extract and Abstract Single Document Summarization

Summarization mainly provides the major topics or theme of document in limited number of words. However, in extract summary we depend upon extracted sentences, while in abstract summary, each summary sentence may contain concise information from multiple sentences. The major facts which affect the quality of summary are: (1) the way of handling noisy or less important terms in document, (2) utilizing information content of terms in document (as, each term may have different levels of importance in document) and (3) finally, the way to identify the appropriate thematic facts in the form of summary. To reduce the effect of noisy terms and to utilize the information content of terms in the document, we introduce the graph theoretical model populated with semantic and statistical importance of terms. Next, we introduce the concept of weighted minimum vertex cover which helps us in identifying the most representative and thematic facts in the document. Additionally, to generate abstract summary, we introduce the use of vertex constrained shortest path based technique, which uses minimum vertex cover related information as valuable resource. Our experimental results on DUC-2001 and DUC-2002 dataset show that our devised system performs better than baseline systems.

[1]  K. Srinathan,et al.  Using Graph Based Mapping of Co-occurring Words and Closeness Centrality Score for Summarization Evaluation , 2012, CICLing.

[2]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[3]  K. Srinathan,et al.  Automatic keyphrase extraction from scientific documents using N-gram filtration technique , 2008, ACM Symposium on Document Engineering.

[4]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[5]  Hsinchun Chen,et al.  Summary in context: Searching versus browsing , 2006, TOIS.

[6]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[7]  R. Bekkerman,et al.  Using Bigrams in Text Categorization , 2003 .

[8]  K. Srinathan,et al.  Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-document Summarization , 2012, CICLing.

[9]  Abdul Sattar,et al.  Two New Local Search Strategies for Minimum Vertex Cover , 2012, AAAI.

[10]  Elena Lloret,et al.  Analyzing the Use of Word Graphs for Abstractive Text Summarization , 2011 .

[11]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[12]  Xiaojun Wan,et al.  Towards a Unified Approach to Simultaneous Single-Document and Multi-Document Summarizations , 2010, COLING.

[13]  Iraklis Varlamis,et al.  SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs , 2010, COLING.

[14]  Rakesh M. Verma,et al.  Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization , 2012, CICLing.

[15]  Rakesh M. Verma,et al.  Automated extractive single-document summarization: beating the baselines with a new approach , 2011, SAC.

[16]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[17]  Xiaojun Wan,et al.  CollabSum: exploiting multiple document clustering for collaborative single document summarizations , 2007, SIGIR.

[18]  Abdul Sattar,et al.  Local search with edge weighting and configuration checking heuristics for minimum vertex cover , 2011, Artif. Intell..