Spectral Clustering in Social Networks

We evaluate various heuristics for hierarchical spectral clustering in large telephone call and Web graphs. Spectral clustering without additional heuristics often produces very uneven cluster sizes or low quality clusters that may consist of several disconnected components, a fact that appears to be common for several data sources but, to our knowledge, no general solution provided so far. Divide-and-Merge, a recently described postfiltering procedure may be used to eliminate bad quality branches in a binary tree hierarchy. We propose an alternate solution that enables k -way cuts in each step by immediately filtering unbalanced or low quality clusters before splitting them further. Our experiments are performed on graphs with various weight and normalization built based on call detail records and Web crawls. We measure clustering quality both by modularity as well as by the geographic and topical homogeneity of the clusters. Compared to divide-and-merge, we give more homogeneous clusters with a more desirable distribution of the cluster sizes.

[1]  Andrew B. Kahng,et al.  Multiway partitioning via geometric embeddings, orderings, and dynamic programming , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[3]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[5]  Graham J. Wills NicheWorks—Interactive Visualization of Very Large Graphs , 1999 .

[6]  Andrew B. Kahng,et al.  Recent directions in netlist partitioning: a survey , 1995, Integr..

[7]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[8]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[11]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[12]  Ulrike von Luxburg,et al.  Limits of Spectral Clustering , 2004, NIPS.

[13]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[14]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[15]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[16]  F. Chung,et al.  Eigenvalues of Random Power law Graphs , 2003 .

[17]  Xin Yao,et al.  A novel evolutionary data mining algorithm with applications to churn prediction , 2003, IEEE Trans. Evol. Comput..

[18]  Charles J. Alpert,et al.  Spectral Partitioning: The More Eigenvectors, The Better , 1995, 32nd Design Automation Conference.

[19]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[20]  E. Barnes An algorithm for partitioning the nodes of a graph , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[21]  Chih-Ping Wei,et al.  Turning telecommunications call details to churn prediction: a data mining approach , 2002, Expert Syst. Appl..

[22]  Michael W. Berry,et al.  SVDPACK: A Fortran-77 Software Library for the Sparse Singular Value Decomposition , 1992 .

[23]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[24]  Chris H. Q. Ding,et al.  A spectral method to separate disconnected and nearly-disconnected web graph components , 2001, KDD '01.

[25]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[27]  Hector Garcia-Molina,et al.  Web Content Categorization Using Link Information , 2006 .

[28]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[29]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[30]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[31]  Sougata Mukherjea,et al.  On the structural properties of massive telecom call graphs: findings and implications , 2006, CIKM '06.

[32]  Piotr Indyk,et al.  Fast mining of massive tabular data via approximate distance computations , 2002, Proceedings 18th International Conference on Data Engineering.

[33]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[34]  Ichigaku Takigawa,et al.  A spectral clustering approach to optimally combining numericalvectors with a modular network , 2007, KDD '07.

[35]  András A. Benczúr,et al.  To randomize or not to randomize: space optimal summaries for hyperlink analysis , 2006, WWW '06.

[36]  Pavel Zakharov Structure of LiveJournal social network , 2007, SPIE International Symposium on Fluctuations and Noise.

[37]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[38]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[39]  Santosh S. Vempala,et al.  A divide-and-merge methodology for clustering , 2005, PODS '05.

[40]  Kevin J. Lang Fixing two weaknesses of the Spectral Method , 2005, NIPS.

[41]  Ronald J. Brachman,et al.  Brief Application Description; Visual Data Mining: Recognizing Telephone Calling Fraud , 2004, Data Mining and Knowledge Discovery.

[42]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .