A Topic Detection and Tracking System with TF-Density

In the past, news consumption took place predominantly via newspapers and were hard to track. Nowadays, the rapid growth of the Internet means that news are continually being shared and stored on a previously unimaginable scale. It is now possible to access several news stories on the same topic on a single web page. In this paper, we proposed a topic detection and tracking system with a new word measurement scheme named TF-Density. TF-Density is a new algorithm modified from the well-known TF-IWF and TF-IDF algorithms to provide a more precise and efficient method to recognize the important words in the text. Through our experiments, we demonstrated that our proposed topic detection and tracking system is capable of providing more precise and convenient result for the tracking of news by users.

[1]  Fang Li,et al.  Hot Topic Detection on BBS Using Aging Theory , 2009, WISM.

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[4]  Min Zhang,et al.  Automatic online news topic ranking using media focus and user attention based on aging theory , 2008, CIKM '08.

[5]  Han-Joon Kim,et al.  News Keyword Extraction for Topic Tracking , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[6]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[7]  Min Zhang,et al.  Automatic online news issue construction in web environment , 2008, WWW.

[8]  Fu Lee Wang,et al.  Web Information Systems and Mining , 2010, Lecture Notes in Computer Science.

[9]  Min Zhang,et al.  An Automatic Online News Topic Keyphrase Extraction System , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[10]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.