Keyphrase Extraction Based on Prior Knowledge

Keyphrase is an important way to quickly get the topic of a document by providing highly-summative information. The previous approaches for keyphrase extraction simply rank keyphrases according to statistics-based model or graph-based model, which ignore the influence of external knowledge. In this paper, we take prior knowledge, which contains controlled vocabulary of keyphrases and their prior probability, into consideration to enhance previous methods. First, we build a controlled vocabulary of keyphrases introduced by keyphrases from existing collections and a keyphrase candidate set is filtered from a given document by it. Then, we use prior probability to represent the importance of keyphrases candidate with TF-IDF and TextRank. Finally, a supervised learning algorithm is used to learn optimal weights of these three features. Experiments on four benchmark datasets show the great advantages of prior knowledge on keyphrase extraction. Furthermore, we achieve competitive performance compared with the state-of-the art methods.