Hot Topic Detection Based on a Refined TF-IDF Algorithm

In this paper, we propose a refined term frequency inversed document frequency (TF-IDF) algorithm called TA TF-IDF to find hot terms, based on time distribution information and user attention. We also put forward a method to generate new terms and combined terms, which are split by the Chinese word segmentation algorithm. Then, we extract hot news according to the hot terms, grouping them into K-means clusters so as to realize the detection of hot topics in news. The experimental results indicated that our method based on the refined TF-IDF algorithm can find hot topics effectively.

[1]  Pengzhu Zhang,et al.  Health-Related Hot Topic Detection in Online Communities Using Text Clustering , 2013, PloS one.

[2]  Dun-Wei Gong,et al.  A return-cost-based binary firefly algorithm for feature selection , 2017, Inf. Sci..

[3]  Li Guo,et al.  Mining Hot Topics from Twitter Streams , 2012, ICCS.

[4]  Fang Li,et al.  Hot Topic Detection on BBS Using Aging Theory , 2009, WISM.

[5]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[6]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[7]  Xuanjing Huang,et al.  BBS Based Hot Topic Retrieval Using Back-Propagation Neural Network , 2004, IJCNLP.

[8]  Wang Xiao Research of Key-Phrase Extraction Based on Lexical Chain , 2010 .

[9]  James Allan,et al.  UMass at TDT 2004 , 2004 .

[10]  Lina Zhao,et al.  Design of Automatic Extraction Algorithm of Knowledge Points for MOOCs , 2015, Comput. Intell. Neurosci..

[11]  Liang He,et al.  A Refined TF-IDF Algorithm Based on Channel Distribution Information for Web News Feature Extraction , 2010, 2010 Second International Workshop on Education Technology and Computer Science.

[12]  Jian Cheng,et al.  Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Hui Xiong,et al.  Detecting and Tracking Topics and Events from Web Search Logs , 2012, TOIS.

[14]  Hong Li,et al.  Netnews Bursty Hot Topic Detection Based on Bursty Features , 2010, 2010 International Conference on E-Business and E-Government.

[15]  Xiaoming Xu,et al.  Improvement and Application of TF•IDF Method Based on Text Classification , 2010, 2010 International Conference on Internet Technology and Applications.

[16]  Jun Zheng,et al.  A hot topic detection method for Chinese Microblog based on topic words , 2014, Proceedings of 2nd International Conference on Information Technology and Electronic Commerce.

[17]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[18]  Mitsuru Ishizuka,et al.  Topic extraction from news archive using TF*PDF algorithm , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[19]  Juan-Zi Li,et al.  Keyword Extraction Using Support Vector Machine , 2006, WAIM.

[20]  Yang Xiao,et al.  Study of TFIDF algorithm , 2009 .