Network Clustering Approximation Algorithm Using One Pass Black Box Sampling

Finding a good clustering of vertices in a network, where vertices in the same cluster are more tightly connected than those in different clusters, is a useful, important, and well-studied task. Many clustering algorithms scale well, however they are not designed to operate upon internet-scale networks with billions of nodes or more. We study one of the fastest and most memory efficient algorithms possible - clustering based on the connected components in a random edge-induced subgraph. When defining the cost of a clustering to be its distance from such a random clustering, we show that this surprisingly simple algorithm gives a solution that is within an expected factor of two or three of optimal with either of two natural distance functions. In fact, this approximation guarantee works for any problem where there is a probability distribution on clusterings. We then examine the behavior of this algorithm in the context of social network trust inference.

[1]  Maria-Florina Balcan,et al.  Approximate clustering without the approximation , 2009, SODA.

[2]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[3]  Alexander Aiken,et al.  Attack-Resistant Trust Metrics for Public Key Certification , 1998, USENIX Security Symposium.

[4]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[5]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[6]  Paolo Avesani,et al.  Moleskiing.it: a Trust-aware Recommender System for Ski Mountaineering , 2005 .

[7]  Thomas DuBois Improving Recommendation Accuracy by Clustering Social Networks with Trust , 2009 .

[8]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[9]  Jennifer Golbeck Generating Predictive Movie Recommendations from Trust in Social Networks , 2006, iTrust.

[10]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[11]  Georg Lausen,et al.  Spreading activation models for trust propagation , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[12]  Aravind Srinivasan,et al.  Rigorous Probabilistic Trust-Inference with Applications to Clustering , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[15]  Dayou Liu,et al.  Fast Complex Network Clustering Algorithm Using Agents , 2009, 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing.

[16]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[17]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  J. Golbeck,et al.  FilmTrust: movie recommendations using trust in web-based social networks , 2006, CCNC 2006. 2006 3rd IEEE Consumer Communications and Networking Conference, 2006..

[19]  Barry Smyth,et al.  Trust in recommender systems , 2005, IUI.

[20]  Ting Wang,et al.  Using complex network features for fast clustering in the web , 2011, WWW.

[21]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[22]  Paolo Avesani,et al.  A trust-enhanced recommender system application: Moleskiing , 2005, SAC '05.

[23]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[24]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Munindar P. Singh,et al.  An adaptive probabilistic trust model and its evaluation , 2008, AAMAS.

[26]  Nicholas R. Jennings,et al.  A Probabilistic Trust Model for Handling Inaccurate Reputation Sources , 2005, iTrust.

[27]  Sach Mukherjee,et al.  Network clustering: probing biological heterogeneity by sparse graphical models , 2011, Bioinform..

[28]  Edward J. Coyle,et al.  An energy efficient hierarchical clustering algorithm for wireless sensor networks , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[29]  Jennifer Golbeck,et al.  Computing and Applying Trust in Web-based Social Networks , 2005 .

[30]  David D. Jensen,et al.  Graph clustering with network structure indices , 2007, ICML '07.

[31]  Yan Li,et al.  Network clustering: Algorithms, modeling, and applications , 2010 .

[32]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[33]  Subhash C. Basak,et al.  Determining structural similarity of chemicals using graph-theoretic indices , 1988, Discret. Appl. Math..

[34]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[35]  Jennifer Golbeck,et al.  Semantic Web Service Composition in Social Environments , 2009, International Semantic Web Conference.

[36]  Ting Wang,et al.  Efficient social network approximate analysis on blogosphere based on network structure characteristics , 2009, SNA-KDD '09.

[37]  Jennifer Golbeck,et al.  Using probabilistic confidence models for trust inference in Web-based social networks , 2010, TOIT.

[38]  Audun Jøsang,et al.  Exploring Different Types of Trust Propagation , 2006, iTrust.

[39]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[40]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.