Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

This paper presents a semantic graph-based method for extractive summarization. The summarizer uses WordNet concepts and relations to produce a semantic graph that represents the document, and a degree-based clustering algorithm is used to discover different themes or topics within the text. The selection of sentences for the summary is based on the presence in them of the most representative concepts for each topic. The method has proven to be an efficient approach to the identification of salient concepts and topics in free text. In a test on the DUC data for single document summarization, our system achieves significantly better results than previously published approaches based on terms and mere syntactic information. Besides, the system can be easily ported to other domains, as it only requires modifying the knowledge base and the method for concept annotation. In addition, we address the problem of word ambiguity in semantic approaches to automatic summarization.

[1]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[2]  Ted Pedersen,et al.  SenseRelate: : TargetWord-A Generalized Framework for Word Sense Disambiguation , 2005, ACL.

[3]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[4]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[5]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[6]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[7]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[8]  Asli Çelikyilmaz,et al.  A Graph-based Semi-Supervised Learning for Question-Answering , 2009, ACL.

[9]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[11]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[12]  Xuanjing Huang,et al.  Using query expansion in graph-based approach for query-focused multi-document summarization , 2009, Inf. Process. Manag..

[13]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[14]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[15]  Laura Plaza Morales,et al.  Using semantic graphs and word sense disambiguation techniques to improve text summarization , 2011 .

[16]  Michel Généreux,et al.  Description of the LIPN Systems at TAC 2008: Summarizing Information and Opinions , 2008, TAC.

[17]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..

[18]  Mourad Oussalah,et al.  A Semantic Summarization System: University of Birmingham at TAC 2008 , 2008, TAC.

[19]  Elena Lloret,et al.  A Text Summarization Approach under the Influence of Textual Entailment , 2016, NLPCS.

[20]  Xiaohua Hu,et al.  A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method , 2007, BMC Bioinformatics.

[21]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[22]  Laura Plaza,et al.  AUTOMATIC SUMMARIZATION OF NEWS USING WORDNET CONCEPT GRAPHS , 2010 .

[23]  Hyoil Han,et al.  The use of domain-specific concepts in biomedical text summarization , 2007, Inf. Process. Manag..

[24]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.