Keyword Indexing System with HowNet and PageRank

Keyword indexing is widely used in natural language processing. This paper proposed an unsupervised keyword indexing method based PageRank and HowNet. In the method, a free text is firstly represented as a sememe graph with sememes as vertices and relatedness of sememes as weighted edges based on HowNet. Then UW-PageRank is applied on the sememe graph to score the importance of sememes. Score of each definition of one word can be computed from the score of sememes it contains. Then, the highest scored definition is assigned to the word. A sememes graph is built again only with the exact definition of each words, and use UW-PageRank again to score all the sememes and then deduced the importance of the words. Finally, the highest scored words are indexed as keywords. The experiment results prove practical and effective.