Clustering and Ranking in Heterogeneous Information Networks via Gamma-Poisson Model

Clustering and ranking have been successfully applied independently to homogeneous information networks, containing only one type of objects. However, real-world information networks are oftentimes heterogeneous, containing multiple types of objects and links. Recent research has shown that clustering and ranking can actually mutually enhance each other, and several techniques have been developed to integrate clustering and ranking together on a heterogeneous information network. To the best of our knowledge, however, all of such techniques assume the network follows a certain schema. In this paper, we propose a probabilistic generative model that simultaneously achieves clustering and ranking on a heterogeneous network that can follow arbitrary schema, where the edges from different types are sampled from a Poisson distribution with the parameters determined by the ranking scores of the nodes in each cluster. A variational Bayesian inference method is proposed to learn these parameters, which can be used to output ranking and clusters simultaneously. Our method is evaluated on both synthetic and real-world networks extracted from the DBLP and YELP data. Experimental results show that our method outperforms the state-of-the-art baselines.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[3]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[4]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[5]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[6]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[7]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[8]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[13]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[14]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[15]  Charles M. Bishop Variational principal components , 1999 .

[16]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[17]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[18]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.