Graph-Based Technique for Extracting Keyphrases in a Single-Document (GTEK)

In this paper, a novel Graph-based Technique for Extracting Keyphrases in a single document (GTEK) is introduced to be used in extractive summarization of text. GTEK is based on the graph-based representation of text, which depends on terms and phrase numeration in sentences rather than some structural document features. GTEK considers the impact of the sentence on the phrases in a document, motivated by the fact that a phrase may be important if it appears in the most important sentences in the document. The Graph-based Growing Self-Organizing Map (G-GSOM) is used to group the sentences into graph-based clusters. TextRank algorithm is applied on graphs of clusters under the assumption that the top-ranked nodes should represent the most important sentences, where the most frequent phrases in these sentences are selected as document keyphrases. Experimental results show that our innovative technique extracts the most keyphrases of two datasets.