论文信息 - RCHIG: An Effective Clustering Algorithm with Ranking

RCHIG: An Effective Clustering Algorithm with Ranking

In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a heterogeneous information graph. A novel clustering framework called RCHIG is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RCHIG can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.

Jianwen Tao

[1] Hector Garcia-Molina,et al. Combating Web Spam with TrustRank , 2004, VLDB.

[2] Yannis Manolopoulos,et al. Generalized Hirsch h-index for disclosing latent facts in citation networks , 2007, Scientometrics.

[3] J. E. Hirsch,et al. An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[4] Yuichi Mori,et al. Handbook of computational statistics : concepts and methods , 2004 .

[5] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6] Yannis Manolopoulos,et al. Generalized h-index for Disclosing Latent Facts in Citation Networks , 2006, ArXiv.

[7] Wei-Ying Ma,et al. Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[8] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[9] David R. Karger,et al. Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[10] Chris Clifton,et al. Knowledge discovery from transportation network data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[11] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[12] Philip S. Yu,et al. LinkClus: efficient clustering via heterogeneous semantic links , 2006, VLDB.

[13] Margaret Werner-Washburne,et al. Integrative Construction and Analysis of Condition-specific Biological Networks , 2008, AAAI.

[14] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[15] Oren Etzioni,et al. Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.