A Keyword Extraction Method for Chinese Scientific Abstracts

Keyword extraction plays an essential role for text mining and further semantic analysis. It is a big challenge to extract keywords from short text, especially from short Chinese text. This paper presents a keyword extraction method for Chinese scientific abstracts. Firstly, an abstract is divided into meaningful units of Chinese words. Secondly, after excluding stop words, the TextRank method is adopted to extract keywords with a graph-based ranking model, and generate multi-word candidates by concatenating keywords adjacent in the abstract text. Thirdly, a keyword dictionary and a probability calculation algorithm based on a Chinese corpus are presented to check whether a word sequence is a correct multi-word keyword. To demonstrate the effectiveness of our method, a comparison experiment is conducted to show that our method outperforms the TextRank algorithm and TFIDF algorithm in Chinese multi-word keyword extraction.