ReCoM: reinforcement clustering of multi-type interrelated data objects

Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated in the same ways as other attributes of the objects. In this paper, we propose a novel clustering approach for clustering multi-type interrelated data objects, ReCoM (Reinforcement Clustering of Multi-type Interrelated data objects). Under this approach, relationships among data objects are used to improve the cluster quality of interrelated data objects through an iterative reinforcement clustering process. At the same time, the link structure derived from relationships of the interrelated data objects is used to differentiate the importance of objects and the learned importance is also used in the clustering process to further improve the clustering results. Experimental results show that the proposed approach not only effectively overcomes the problem of data sparseness caused by the high dimensional relationship space but also significantly improves the clustering accuracy.

[1]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Katia P. Sycara,et al.  WebMate: a personal agent for browsing and searching , 1998, AGENTS '98.

[4]  Soumen Chakrabarti,et al.  Data mining for hypertext (tutorial session) (title only) , 2000, KDD '00.

[5]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[6]  Qiang Yang,et al.  Correlation-based document clustering using web logs , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[7]  Robert L. Grossman,et al.  Data Mining for Scientific and Engineering Applications , 2001, Massive Computing.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[10]  Jeffrey Heer,et al.  Identification of Web User Traffic Composition using Multi-Modal Clustering and Information Scent , 2000 .

[11]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[12]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[13]  Mark Craven,et al.  Combining Statistical and Relational Methods for Learning in Hypertext Domains , 1998, ILP.

[14]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[15]  Philip S. Yu,et al.  Clustering through decision tree construction , 2000, CIKM '00.

[16]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[17]  Wei-Ying Ma,et al.  A unified framework for clustering heterogeneous Web objects , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[18]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[19]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[20]  Soumen Chakrabarti,et al.  Data mining for hypertext: a tutorial survey , 2000, SKDD.