DiCeS: Detecting Communities in Network Streams over the Cloud

We consider the problem of uncovering communities in complex real-world networks whose nodes and their respective associations originate in streams of data. Although community detection has received much attention in centralized settings, the prevalence of online social networks has resulted in unprecedented volumes of data whose handling calls for novel streaming approaches. Moreover, bursty production of network interactions necessitates cloud-enabled techniques that can both deal with diverse data rates and deploy more computing resources on the fly for improved performance yields. We propose a distributed streaming community detection approach termed DiCeS, and implement it as a cloud application. While seeking communities, the novelty of our approach is at balancing the incoming load to a cluster of computing nodes and adjusting the cluster processing capacity in an elastic manner. We also provide fault tolerance by ensuring that temporarily suspended or failed nodes are restored and all edges of the network stream ultimately received their due processing. Lastly, DiCeS is interactive regarding i) updating the target communities, and ii) obtaining results on demand. Our experimental results demonstrate that DiCeS does handle the edges of real-world network streams at impressive rates, allows for near-linear scaling, and outperforms previous non-distributed approaches. While using ground-truth communities for a wide range of large real-word networks, we also show that DiCeS attains improved accuracy if compared to earlier centralized algorithms.

[1]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[2]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Zhuo Liu,et al.  PaceMaker: When ZooKeeper Arteries Get Clogged in Storm Clusters , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[4]  Meredith Mark,et al.  Towards Performance Modeling as a Service by Exploiting Resource Diversity in the Public Cloud , 2016 .

[5]  Jon M. Kleinberg,et al.  Community membership identification from small seed sets , 2014, KDD.

[6]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[7]  Alexandre Proutière,et al.  Streaming, Memory Limited Algorithms for Community Detection , 2014, NIPS.

[8]  Alex Delis,et al.  Flexible use of cloud resources through profit maximization and price discrimination , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[9]  R. Lambiotte,et al.  Line graphs, link partitions, and overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Alex Delis,et al.  Scalable link community detection: A local dispersion-aware approach , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[11]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[12]  Kun He,et al.  Detecting Overlapping Communities from Local Spectral Subspaces , 2015, 2015 IEEE International Conference on Data Mining.

[13]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[14]  Jure Leskovec,et al.  Structure and Overlaps of Ground-Truth Communities in Networks , 2014, TIST.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[17]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[18]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Kun He,et al.  Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach , 2015, WWW.

[21]  Keren Ouaknine,et al.  Optimization of RocksDB for Redis on Flash , 2017, ICCDA '17.

[22]  Alex Delis,et al.  Realizing Memory-Optimized Distributed Graph Processing , 2018, IEEE Transactions on Knowledge and Data Engineering.

[23]  Alex Delis,et al.  COEUS: Community detection via seed-set expansion on graph streams , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[24]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[25]  Alexandre Hollocou,et al.  A linear streaming algorithm for community detection in very large networks , 2017, ArXiv.

[26]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.