A model-based approach to attributed graph clustering

Graph clustering, also known as community detection, is a long-standing problem in data mining. However, with the proliferation of rich attribute information available for objects in real-world graphs, how to leverage structural and attribute information for clustering attributed graphs becomes a new challenge. Most existing works take a distance-based approach. They proposed various distance measures to combine structural and attribute information. In this paper, we consider an alternative view and propose a model-based approach to attributed graph clustering. We develop a Bayesian probabilistic model for attributed graphs. The model provides a principled and natural framework for capturing both structural and attribute aspects of a graph, while avoiding the artificial design of a distance measure. Clustering with the proposed model can be transformed into a probabilistic inference problem, for which we devise an efficient variational algorithm. Experimental results on large real-world datasets demonstrate that our method significantly outperforms the state-of-art distance-based attributed graph clustering method.

[1]  Micah Adler,et al.  Clustering Relational Data Using Attribute and Link Information , 2003 .

[2]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[3]  Christos Faloutsos,et al.  HCDF: A Hybrid Community Discovery Framework , 2010, SDM.

[4]  W. Marsden I and J , 2012 .

[5]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[6]  Christophe Ambroise,et al.  Clustering based on random graph model embedding vertex features , 2009, Pattern Recognit. Lett..

[7]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[8]  Alberto Sanfeliu,et al.  Synthesis of Function-Described Graphs and Clustering of Attributed Graphs , 2002, Int. J. Pattern Recognit. Artif. Intell..

[9]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Kyu Ho Park,et al.  Incremental clustering of attributed graphs , 1993, IEEE Trans. Syst. Man Cybern..

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  Nitesh V. Chawla,et al.  Community Detection in a Large Real-World Social Network , 2008 .

[14]  Francisco Escolano,et al.  ACM Attributed Graph Clustering for Learning Classes of Images , 2003, GbRPR.

[15]  Alberto Sanfeliu,et al.  Clustering of attributed graphs and unsupervised synthesis of function-described graphs , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[16]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[17]  Fritz Wysotzki,et al.  Central Clustering of Attributed Graphs , 2004, Machine Learning.

[18]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Hong Cheng,et al.  Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities , 2011, TKDD.