Keyphrase Extraction from Bengali Document using LSTM Recurrent Neural Network

Keyphrases are single or multiple word phrases of a document which portrays the principal points of that document. These keyphrases help readers to get an overview of the document. In this paper, we proposed a system that uses Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) to automatically detect keyphrases from a document. We also implemented Multilayer Perceptron (MLP) network to compare the performance of our proposed LSTM approach. We applied several pre-processing steps on a document to generate the candidate keyphrases. Finally, we found better performance from our proposed approach with compared to the MLP network.

[2]  Kamal Sarkar An Improved Approach to Bengali Keyphrase Extraction , 2014, 2014 Fourth International Conference of Emerging Applications of Information Technology.

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  Marco Basaldella,et al.  Bidirectional LSTM Recurrent Neural Network for Keyphrase Extraction , 2018, IRCDL.

[5]  Xiaoli Li,et al.  Keyphrase Extraction using Sequential Labeling , 2016, ArXiv.

[6]  Yi-Shin Chen,et al.  Collaborative Ranking between Supervised and Unsupervised Approaches for Keyphrase Extraction , 2014, ROCLING.

[7]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[8]  Mita Nasipuri,et al.  Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks , 2012, J. Inf. Process. Syst..

[9]  Peng Yang,et al.  Incorporating Expert Knowledge into Keyphrase Extraction , 2017, AAAI.

[10]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[11]  Rui Wang Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors , 2015 .

[12]  Xuanjing Huang,et al.  Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter , 2016, EMNLP.

[13]  Kamal Sarkar Automatic Keyphrase Extraction from Bengali Documents: A Preliminary Study , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[14]  Florian Boudin,et al.  Unsupervised Keyphrase Extraction with Multipartite Graphs , 2018, NAACL.

[15]  Sabir Ismail,et al.  Developing an automated Bangla parts of speech tagged dictionary , 2014, 16th Int'l Conf. Computer and Information Technology.

[16]  Florian Boudin,et al.  pke: an open source python-based keyphrase extraction toolkit , 2016, COLING.

[17]  Shuguang Han,et al.  Deep Keyphrase Generation , 2017, ACL.