Social Influence Based Clustering of Heterogeneous Information Networks

Social networks continue to grow in size and the type of information hosted. We witness a growing interest in clustering a social network of people based on both their social relationships and their participations in activity based information networks. In this paper, we present a social influence based clustering framework for analyzing heterogeneous information networks with three unique features. First, we introduce a novel social influence based vertex similarity metric in terms of both self-influence similarity and co-influence similarity. We compute self-influence and coinfluence based similarity based on social graph and its associated activity graphs and influence graphs respectively. Second, we compute the combined social influence based similarity between each pair of vertices by unifying the self-similarity and multiple co-influence similarity scores through a weight function with an iterative update method. Third, we design an iterative learning algorithm, SI-Cluster, to dynamically refine the K clusters by continuously quantifying and adjusting the weights on self-influence similarity and on multiple co-influence similarity scores towards the clustering convergence. To make SI-Cluster converge fast, we transformed a sophisticated nonlinear fractional programming problem of multiple weights into a straightforward nonlinear parametric programming problem of single variable. Our experiment results show that SI-Cluster not only achieves a better balance between self-influence and co-influence similarities but also scales extremely well for large graph clustering.

[1]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[2]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[3]  Michael R. Lyu,et al.  Mining social networks using heat diffusion processes for marketing candidates selection , 2008, CIKM '08.

[4]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[5]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[6]  Jure Leskovec,et al.  Information diffusion and external influence in networks , 2012, KDD.

[7]  Yossi Matias,et al.  Suggesting friends using the implicit social graph , 2010, KDD.

[8]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[9]  Jiawei Han,et al.  Community Mining from Multi-relational Networks , 2005, PKDD.

[10]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[11]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[12]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[13]  Ichigaku Takigawa,et al.  A spectral clustering approach to optimally combining numericalvectors with a modular network , 2007, KDD '07.

[14]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Jignesh M. Patel,et al.  Discovery-driven graph summarization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Ambuj K. Singh,et al.  Scalable discovery of best clusters on large graphs , 2010, Proc. VLDB Endow..

[17]  Yizhou Sun,et al.  Query-driven discovery of semantically similar substructures in heterogeneous networks , 2012, KDD.

[18]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[19]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[20]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[21]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[22]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[23]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[24]  Charu C. Aggarwal,et al.  Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes , 2012, Proc. VLDB Endow..

[25]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[26]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.