Forum topic detection based on hierarchical clustering

Forum has become one of the main platforms for people to express their personal point of view, with a lot of information surging in the forum everyday. How to detect automatically a forum topic among the massive information becomes an important and hard task. Though there are plenty of studies for topic detection, it is still a challenge to make it fast and accurately. This paper introduces the principle of maximum entropy and information gain when calculating feature weight. Our algorithm is based on the agglomerative hierarchical clustering (AHC). Experiments are focused on a game forum and handling sparse forum short texts. The result shows that the improved method can detect the forum topic more effectively.

[1]  Yitao Yang,et al.  Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis , 2014, 2014 Asia-Pacific Services Computing Conference.

[2]  Wang Songfeng,et al.  Analyzing and Verifying of SysML Activity Diagram Based on Petri Net , 2012 .

[3]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[4]  Zhang Min Data mining technology and its application , 2010 .

[5]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[6]  Jing Xu,et al.  Topic Detection Based on User Intention , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[7]  Zhang Xiaoming,et al.  Research of Automatic Topic Detection Based on Incremental Clustering , 2012 .

[8]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[9]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[10]  Young-Woo Seo,et al.  Text clustering for topic detection , 2004 .

[11]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[12]  Miao Duoqian,et al.  News Topic Detection Approach on Chinese Microblog , 2012 .

[13]  Sung-Hyon Myaeng,et al.  Use of place information for improved event tracking , 2007, Inf. Process. Manag..

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Guan Wang,et al.  Conversation Detection and Organization of Mobile Text Messages: Conversation Detection and Organization of Mobile Text Messages , 2012 .

[16]  References , 1971 .