A CLUSTERED SEMANTIC GRAPH APPROACH FOR MULTI-DOCUMENT ABSTRACTIVE SUMMARIZATION

Multi-document abstractive summarization aims is to create a compact version of the source text and preserves the important information. The existing graph based methods rely on Bag of Words approach, which treats sentence as bag of words and relies on content similarity measure. The obvious limitation of Bag of Words approach is that it ignores semantic relationships among words and thus the summary produced from the source text would not be adequate. This paper proposes a clustered semantic graph based approach for multi-document abstractive summarization. The approach operates by employing semantic role labeling (SRL) to extract the semantic structure (predicate argument structures) from the document text. The predicate argument structures (PASs) are compared pair wise based on Lin semantic similarity measure to build semantic similarity matrix, which is thus represented as semantic graph whereas the vertices of graph represent the PASs and the edges correspond to the semantic similarity weight between the vertices. Content selection for summary is made by ranking the important graph vertices (PASs) based on modified graph based ranking algorithm. Agglomerative hierarchical clustering is performed to eliminate redundancy in such a way that representative PAS with the highest salience score from each cluster is chosen, and fed to language generation to generate summary sentences. Experiment of this study is performed using DUC-2002, a standard corpus for text summarization. Experimental results reveal that the proposed approach outperforms other summarization systems.