Clustering Coefficient Queries on Massive Dynamic Social Networks

The Clustering Coefficient (CC) is a fundamental measure in social network analysis assessing the degree to which nodes tend to cluster together. While CC computation on static graphs is well studied, emerging applications have new requirements for online query of the "global" CC of a given subset of a graph. As social networks are widely stored in databases for easy updating and accessing, computing CC of their subset becomes a time-consuming task, especially when the network grows large and cannot fit in memory. This paper presents a novel method called "Approximate Neighborhood Index (ANI)" to significantly reduce the query latency for CC computation compared to traditional SQL based database queries. A Bloom-filter-like data structure is leveraged to construct ANI in front of a relational database. Experimental results show that the proposed approach can guarantee the correctness of a CC query while significantly reducing the query latency at a reasonable memory cost.

[1]  Vladimir Batagelj,et al.  A subquadratic triad census algorithm for large sparse networks with small maximum degree , 2001, Soc. Networks.

[2]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[3]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[4]  Sachin Agarwal,et al.  Efficient PDA Synchronization , 2003, IEEE Trans. Mob. Comput..

[5]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[6]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[7]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[8]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[9]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[10]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[11]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[12]  Priscilla S. Markwood,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[13]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[14]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[15]  Christos Faloutsos,et al.  Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.