Hot keyword identification for extracting web public opinion

Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering method based on semi-supervised learning to get focuses of social topics in a large amount of text. We develop a novel keyword extraction method named NATF-PDF, which is based on TFPDF algorithm, combined with supervised learning theory for keyword extraction. We compare its performance with TFIDF in comparison, and the results show that our method get better accuracy and recall ratio.