A Keyword Extraction Approach for Single Document Extractive Summarization Based on Topic Centrality

Graph based keyword extraction approaches represent the text document as a graph with the words as vertices. An edge between two vertices exists only if the corresponding words are related. The effectiveness of the graph representation is dependent on how the relationship is defined between two words. Early approaches used word order, word co-occurrence and syntactic relationships. Recent graph representation approaches are using lexical association measures for defining relationship between two words. The existing relationships can only help in identifying the words which form the topics in a document. But the topics in a document essentially form a theme in the document. Theme is the central idea conveyed though the topics in a document. In this paper a new relationship is defined between the words to capture the words which convey the theme of the document. This relationship is defined using lexical association among the words in the document. Based on the relationship a graph representation of text is defined and used for extracting the words that build the theme of the document. These words are used for selecting important sentences in the document. Experiments on DUC 2002 data set indicated that the proposed keyword extraction approach improves the quality of Extractive summarization.

[1]  Ana Mestrovic,et al.  Toward Selectivity-Based Keyword Extraction for Croatian News , 2014, SDSW@ISWC.

[2]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[3]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[4]  Mitsuru Ishizuka,et al.  KeyWorld: Extracting Keywords from a Document as a Small World , 2001, Discovery Science.

[5]  Yukio Ohsawa,et al.  KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[6]  T. Martin McGinnity,et al.  A Context-Based Word Indexing Model for Document Summarization , 2013, IEEE Transactions on Knowledge and Data Engineering.

[7]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[8]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[9]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[10]  Miguel A. Andrade-Navarro,et al.  Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families , 1998, Bioinform..

[11]  Mitsuru Ishizuka,et al.  A Document as a Small World , 2001, JSAI Workshops.

[12]  Paul Over,et al.  Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[13]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[15]  Shibamouli Lahiri,et al.  Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks , 2014, ArXiv.

[16]  Ch. Satyananda Reddy,et al.  Extractive Text Summarization Using Lexical Association and Graph Based Text Analysis , 2016 .

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Zhuli Xie Centrality Measures in Text Mining: Prediction of Noun Phrases that Appear in Abstracts , 2005, ACL.

[19]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[20]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[21]  Sanda Martinčić-Ipšić,et al.  An Overview of Graph-Based Keyword Extraction Methods and Approaches , 2015 .

[22]  Murali Krishna,et al.  Thematic Text Graph: A Text Representation Technique for Keyword Weighting in Extractive Summarization System , 2016 .

[23]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.