Comparative Study on Keyword Extraction Algorithms for Single Extractive Document

The Automatic Text Summarization is most discussed area in Text Mining; there are various techniques available in text mining for text summarization. The two type of summarization are the extractive and abstractive text summarization. The main aim of text summarization is to obtain the concise meaningful text from the original text document. Keywords plays an important role in building a summarization text, there are several keyword extraction algorithms were proposed. In this paper, we implemented most popular keyword extraction algorithms the TF-IDF(a baseline algorithm), TextRank and RAKE algorithm. These keywords extraction algorithms were tested their effectiveness in finding important keywords from single document; the retrieved keywords are compared with the manually selected keywords. The comparison is performed to check the performance of each implemented algorithms with each other and with manually selected keywords.

[1]  Khu P. Nguyen,et al.  An adaptive Latent Semantic Analysis for text mining , 2017, 2017 International Conference on System Science and Engineering (ICSSE).

[2]  Te-Ming Chang,et al.  A hybrid approach to automatic text summarization , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Nidhika Yadav,et al.  Text Summarization Using Sentiment Analysis for DUC Data , 2016, 2016 International Conference on Information Technology (ICIT).

[5]  Jiabao Zhao,et al.  TextRank Algorithm by Exploiting Wikipedia for Short Text Keywords Extraction , 2016, 2016 3rd International Conference on Information Science and Control Engineering (ICISCE).

[6]  Shree Jaswal,et al.  Multiple Text Document Summarization System using hybrid Summarization technique , 2015, 2015 1st International Conference on Next Generation Computing Technologies (NGCT).

[7]  Cong Wang,et al.  Keyword Extraction Using PageRank on Synonym Networks , 2010, 2010 International Conference on E-Product E-Service and E-Entertainment.

[8]  Pascal Matsakis,et al.  Evaluation of stop word lists in text retrieval using Latent Semantic Indexing , 2011, 2011 Sixth International Conference on Digital Information Management.

[9]  Ping Shi,et al.  An approach to automatic summarization for Chinese text based on the combination of spectral clustering and LexRank , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[10]  S. Chitrakala,et al.  A survey on abstractive text summarization , 2016, 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT).

[11]  Ying Qin Applying frequency and location information to keyword extraction in single document , 2012, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems.

[12]  Akshi Kumar,et al.  Performance analysis of keyword extraction algorithms assessing extractive text summarization , 2017, 2017 International Conference on Computer, Communications and Electronics (Comptelix).

[13]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[14]  Maulahikmah Galinium,et al.  Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach , 2014, 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE).

[15]  Seyyed Mohammad Hossein Dadgar,et al.  A novel text mining approach based on TF-IDF and Support Vector Machine for news classification , 2016, 2016 IEEE International Conference on Engineering and Technology (ICETECH).

[16]  S. Chitrakala,et al.  A survey on extractive text summarization , 2017, 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP).

[17]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[18]  Partha Ghosh,et al.  Time and location based summarized PageRank calculation of Web pages , 2014, 2014 IEEE International Conference on Industrial Technology (ICIT).