Keyphrase Generation Based on Self-Attention Mechanism

Keyphrase greatly provides summarized and valuable information. This information can help us not only understand text semantics, but also organize and retrieve text content effectively. The task of automatically generating it has received considerable attention in recent decades. From the previous studies, we can see many workable solutions for obtaining keyphrases. One method is to divide the content to be summarized into multiple blocks of text, then we rank and select the most important content. The disadvantage of this method is that it cannot identify keyphrase that does not include in the text, let alone get the real semantic meaning hidden in the text. Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text, but the inherently sequential nature precludes parallelization within training examples, and distances have limitations on context dependencies. Previous works have demonstrated the benefits of the self-attention mechanism, which can learn global text dependency features and can be parallelized. Inspired by the above observation, we propose a keyphrase generation model, which is based entirely on the self-attention mechanism. It is an encoder-decoder model that can make up the above disadvantage effectively. In addition, we also consider the semantic similarity between keyphrases, and add semantic similarity processing module into the model. This proposed model, which is demonstrated by empirical analysis on five datasets, can achieve competitive performance compared to baseline methods.

[1]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jing-Song Hu,et al.  Automatic Keyphrases Extraction from Document Using Neural Network , 2005, ICMLC.

[4]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[5]  Evangelos E. Milios,et al.  World Wide Web site summarization , 2004, Web Intell. Agent Syst..

[6]  Xuanjing Huang,et al.  Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter , 2016, EMNLP.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[10]  Gábor Berend,et al.  Opinion Expression Mining by Exploiting Keyphrase Extraction , 2011, IJCNLP.

[11]  Akira Shimazu,et al.  Unsupervised Keyphrase Extraction: Introducing New Kinds of Words to Keyphrases , 2016, Australasian Conference on Artificial Intelligence.

[12]  Weidong Xiao,et al.  Keyphrase Generation Based on Deep Seq2seq Model , 2018, IEEE Access.

[13]  Jiawei Han,et al.  Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents , 2014, SDM.

[14]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.

[15]  Cornelia Caragea,et al.  Extracting Keyphrases from Research Papers Using Citation Networks , 2014, AAAI.

[16]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction by Bridging Vocabulary Gap , 2011, CoNLL.

[17]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[18]  Mark S. Staveley,et al.  Phrasier: a system for interactive document retrieval using keyphrases , 1999, SIGIR '99.

[19]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[20]  Maurizio Marchese,et al.  Large Dataset for Keyphrases Extraction , 2009 .

[21]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[22]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[23]  Timothy Baldwin,et al.  Automatic keyphrase extraction from scientific articles , 2013, Lang. Resour. Evaluation.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Xixi Yan,et al.  A Proxy Re-encryption with Keyword Search Scheme in Cloud Computing , 2018 .