SSDE-Cluster: Fast Overlapping Clustering of Networks Using Sampled Spectral Distance Embedding and GMMs

Clustering social networks is vital to understanding online interactions and influence. This task becomes more difficult when communities overlap, and when the social networks become extremely large. We present an efficient algorithm for constructing overlapping clusters, (approximately linear). The algorithm first embeds the graph and then performs a metric clustering using a Gaussian Mixture Model (GMM). We evaluate the algorithm on the DBLP paper-paper network which consists of about 1 million nodes and over 30 million edges, we can cluster this network in under 20 minutes on a modest single CPU machine.

[1]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Jonathan T. Purnell,et al.  Approximating the Covariance Matrix with Low-rank Perturbations , 2010 .

[3]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[5]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[6]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[7]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[8]  Mohammed J. Zaki,et al.  Clusterability Detection and Cluster Initialization , 2000 .

[9]  G. Karypis,et al.  Clustering In A High-Dimensional Space Using Hypergraph Models , 2004 .

[10]  Ira Assent,et al.  Clicks: An effective algorithm for mining subspace clusters in categorical datasets , 2007, Data Knowl. Eng..

[11]  Malik Magdon-Ismail,et al.  SSDE: Fast Graph Drawing Using Sampled Spectral Distance Embedding , 2006, Graph Drawing.

[12]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[13]  Malik Magdon-Ismail,et al.  Finding communities by clustering a graph into overlapping subgraphs , 2005, IADIS AC.

[14]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[15]  Mohammad Al Hasan,et al.  Clustering with Lower Bound on Similarity , 2009, PAKDD.

[16]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[17]  Mohammed J. Zaki,et al.  SCHISM: a new approach to interesting subspace mining , 2005, Int. J. Bus. Intell. Data Min..

[18]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[19]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[20]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[21]  Malik Magdon-Ismail,et al.  Efficient Identification of Overlapping Communities , 2005, ISI.

[22]  Malik Magdon-Ismail,et al.  Approximating the Covariance Matrix of GMMs with Low-Rank Perturbations , 2010, IDEAL.

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[25]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[26]  Steve Gregory,et al.  An Algorithm to Find Overlapping Community Structure in Networks , 2007, PKDD.

[27]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[28]  Mohammed J. Zaki,et al.  CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques , 2005, 21st International Conference on Data Engineering (ICDE'05).

[29]  Sanjoy Dasgupta,et al.  A Two-Round Variant of EM for Gaussian Mixtures , 2000, UAI.

[30]  Malik Magdon-Ismail,et al.  SDE: Graph Drawing Using Spectral Distance Embedding , 2005, Graph Drawing.

[31]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[32]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[33]  A. Crofts,et al.  Structure and function of the -complex of , 1992 .

[34]  Ira Assent,et al.  CLICKS: an effective algorithm for mining subspace clusters in categorical datasets , 2005, KDD '05.