On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering

In this paper we present our research of online hot topic detection and label extraction method for our hot topic recommendation system. Using a new topical feature selection method, the feature space is compressed suitable for an online system. The tolerance rough set model is used to enriching the small set of topical feature words to a topical approximation space. According to the distance defined on the topical approximation space, the web pages are clustered into groups which will be merged with document overlap. The topic labels are extracted based on the approximation topical space enriched with the useful but high frequency topical words dropped by the clustering process. The experiments show that our method could generate more information abundant classes and more topical class labels, alleviate the topical drift caused by the non-topical and noise words.

[1]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[2]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[3]  Andrzej Skowron,et al.  Tolerance Approximation Spaces , 1996, Fundam. Informaticae.

[4]  R. Papka,et al.  On-line new event detection and tracking , 1998, SIGIR '98.

[5]  T. Ho,et al.  A Rough Set Approach to Information Retrieval , 1998 .

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Tzu-Chuen Lu,et al.  Mining association rules procedure to support on-line recommendation by customers and products fragmentation , 2001, Expert Syst. Appl..

[8]  Daniel Sánchez,et al.  Association Rule Extraction for Text Mining , 2002, FQAS.

[9]  Tie-Yan Liu,et al.  AggregateRank: bringing order to web sites , 2006, SIGIR '06.

[10]  Qi Zhang,et al.  The Design and Implementation of the Crawler-Inar , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[11]  Xiaolong Wang,et al.  A Pragmatic Chinese Word Segmentation System , 2006, SIGHAN@COLING/ACL.

[12]  John Yen,et al.  Topic segmentation with shared topic detection and alignment of multiple documents , 2007, SIGIR.

[13]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[14]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.