Detecting hot topics in technology news streams

Detecting hot topics with a fine granularity in technology news streams is an interesting and important problem given the large amount of reports and a relatively narrow range of topics. In this paper, a three-phase method is proposed. In the first phase, the document topic distribution vector is generated and keywords are extracted for each document using topic model pachinko allocation. In the second phase, the documents are clustered based on the document topic distribution vector obtained from the previous phase using affinity propagation. And in the last phase, actual events denoted by combinations of keywords within each cluster are found out using frequent pattern mining algorithms. We evaluate our approach on a collection of technology news reports from various sites in a fixed time period. T he results show that this method is effective.

[1]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[2]  Jiajun Bu,et al.  Bursty feature based topic detction and summarization , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[3]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[4]  Christian Borgelt,et al.  Simple Algorithms for Frequent Item Set Mining , 2010, Advances in Machine Learning II.

[5]  Jeffrey Xu Yu,et al.  Detecting Priming News Events , 2012, ArXiv.

[6]  Junping Du,et al.  Topic detection for emergency events based on FCM document clustering , 2010, 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT).

[7]  Qiudan Li,et al.  QuestionHolic: Hot topic discovery and trend analysis in community question answering systems , 2011, Expert Syst. Appl..

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[10]  Xing Xie,et al.  Context-based Local Hot Topic Detection for Mobile User , 2009 .

[11]  Chih-Ping Wei,et al.  Discovering Event Evolution Graphs From News Corpora , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[12]  Fabio Crestani,et al.  Construction of topics and clusters in Topic Detection and Tracking tasks , 2011, 2011 International Conference on Semantic Technology and Information Retrieval.

[13]  Guan Yi A Survey of Document Clustering , 2006 .

[14]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[15]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[16]  Bin Zhang,et al.  Cluster Based Detection and Analysis of Internet Topics , 2011, 2011 Fourth International Symposium on Computational Intelligence and Design.

[17]  Hong Li,et al.  Netnews Bursty Hot Topic Detection Based on Bursty Features , 2010, 2010 International Conference on E-Business and E-Government.

[18]  Deyuan Zhang,et al.  Finding main topics in blogosphere using document clustering based on topic model , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[19]  Xiaolong Wang,et al.  Online topic detection and tracking of financial news based on hierarchical clustering , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[20]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.