论文信息 - News Keyword Extraction for Topic Tracking

News Keyword Extraction for Topic Tracking

This paper presents a keyword extraction technique that can be used for tracking topics over time. In our work, keywords are a set of significant words in an article that gives high-level description of its contents to readers. Identifying keywords from a large amount of on-line news data is very useful in that it can produce a short summary of news articles. As on-line text documents rapidly increase in size with the growth of WWW, keyword extraction has become a basis of several text mining applications such as search engine, text categorization, summarization, and topic detection. Manual keyword extraction is an extremely difficult and time consuming task; in fact, it is almost impossible to extract keywords manually in case of news articles published in a single day due to their volume. For a rapid use of keywords, we need to establish an automated process that extracts keywords from news articles. We propose an unsupervised keyword extraction technique that includes several variants of the conventional TF-IDF model with reasonable heuristics.

Han-Joon Kim | Sungjick Lee | Han-joon Kim | Sungjick Lee

[1] S. Robertson. The probability ranking principle in IR , 1997 .

[2] Michael D. Gordon,et al. A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..

[3] Stephen E. Robertson,et al. Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.