A novel probabilistic clustering model for heterogeneous networks

Heterogeneous networks, consisting of multi-type objects coupled with various relations, are ubiquitous in the real world. Most previous work on clustering heterogeneous networks either converts them into homogeneous networks or simplifies the modeling of the heterogeneity in terms of specific objects, structures or assumptions. However, few studies consider all relevant objects and relations, and trade-off between integrating relevant objects and reducing the noises caused by relations across objects. In this paper, we propose a general probabilistic graphical model for clustering heterogeneous networks. First, we present a novel graphical representation based on our basic assumptions: different relation types produce different weight distributions to specify intra-cluster probability between two objects, and clusters are formed around cluster cores. Then, we derive an efficient algorithm called PROCESS, standing for PRObabilistic Clustering modEl for heterogeneouS networkS. PROCESS employs a balance-controlled message passing algorithm and mathematical programming for inference and estimation. Experimental results show that our approach is effective and significantly outperforms the state-of-the-art algorithms on both synthetic and real data from heterogeneous networks.

[1]  David D. Jensen,et al.  Graph clustering with network structure indices , 2007, ICML '07.

[2]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[3]  N. Metropolis,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2017 .

[4]  Charu C. Aggarwal,et al.  Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes , 2012, Proc. VLDB Endow..

[5]  Ichigaku Takigawa,et al.  A spectral clustering approach to optimally combining numericalvectors with a modular network , 2007, KDD '07.

[6]  Yizhou Sun,et al.  Personalized entity recommendation: a heterogeneous information network approach , 2014, WSDM.

[7]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[8]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[9]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[10]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[11]  Xiaoran Xu,et al.  BibClus: A Clustering Algorithm of Bibliographic Networks by Message Passing on Center Linkage Structure , 2011, 2011 IEEE 11th International Conference on Data Mining.

[12]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[13]  Ulrik Brandes,et al.  Experiments on Graph Clustering Algorithms , 2003, ESA.

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[15]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[17]  Philip S. Yu,et al.  Link Mining: Models, Algorithms, and Applications , 2014, Link Mining.

[18]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[25]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[26]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[27]  Tjalling J. Ypma,et al.  Historical Development of the Newton-Raphson Method , 1995, SIAM Rev..

[28]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[29]  Ling Liu,et al.  Social influence based clustering of heterogeneous information networks , 2013, KDD.

[30]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Philip S. Yu,et al.  Coupled Behavior Analysis with Applications , 2012, IEEE Transactions on Knowledge and Data Engineering.

[32]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[33]  Heng Ji,et al.  Exploring and inferring user–user pseudo‐friendship for sentiment analysis with heterogeneous networks , 2014, Stat. Anal. Data Min..

[34]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[35]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[36]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988, Wiley interscience series in discrete mathematics and optimization.

[37]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[38]  Huan Liu,et al.  Uncoverning Groups via Heterogeneous Interaction Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.