论文信息 - Keyphrase Extraction Based on Prior Knowledge

Keyphrase Extraction Based on Prior Knowledge

Keyphrase is an important way to quickly get the topic of a document by providing highly-summative information. The previous approaches for keyphrase extraction simply rank keyphrases according to statistics-based model or graph-based model, which ignore the influence of external knowledge. In this paper, we take prior knowledge, which contains controlled vocabulary of keyphrases and their prior probability, into consideration to enhance previous methods. First, we build a controlled vocabulary of keyphrases introduced by keyphrases from existing collections and a keyphrase candidate set is filtered from a given document by it. Then, we use prior probability to represent the importance of keyphrases candidate with TF-IDF and TextRank. Finally, a supervised learning algorithm is used to learn optimal weights of these three features. Experiments on four benchmark datasets show the great advantages of prior knowledge on keyphrase extraction. Furthermore, we achieve competitive performance compared with the state-of-the art methods.

Chuan Wu | Wei Lu | Haoran Cui | Guoxiu He | Junwei Fang

[1] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[2] Maurizio Marchese,et al. Large Dataset for Keyphrases Extraction , 2009 .

[3] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[4] Min-Yen Kan,et al. Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[5] Shuguang Han,et al. Deep Keyphrase Generation , 2017, ACL.

[6] Anette Hulth,et al. Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[7] Yi-fang Brook Wu,et al. Domain-specific keyphrase extraction , 2005, CIKM '05.