Fast Uncovering of Graph Communities on a Chip: Toward Scalable Community Detection on Multicore and Manycore Platforms

Graph representations are pervasive in scientific and social computing.They serve as vital tools to model the interplay among differentinteracting entities.In this paper, we visit the problem of community detection, which isone of the most widely used graph operations toward scientific discovery.Community detection refers to the process of identifying tightlyknitsubgroups of vertices in a large graph. These sub-groups or communitiesrepresent vertices that are tied together through commonstructure or function. Identification of communities could help in understandingthe modular organization of complex networks. However,owing to large data sizes and high computational costs, performingcommunity detection at scale has become increasingly challenging.Here, we present a detailed review and analysis of some of the leadingcomputational methods and implementations developed for executingcommunity detection on modern day multicore and manycorearchitectures. Our goals are to: a define the problem of community detectionand highlight its scientific significance; b relate to challengesin parallelizing the operation on modern day architectures; c providea detailed report and logical organization of the approaches that havebeen designed for various architectures; and d finally, provide insightsinto the strengths and suitability of different architectures for communitydetection, and a preview into the future trends of the area. It is ourhope that this detailed treatment of community detection on parallelarchitectures can serve as an exemplar study for extending the applicationof modern day multicore and manycore architectures to othercomplex graph applications.

[1]  Radu Marculescu,et al.  Application-specific network-on-chip architecture customization via long-range link insertion , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[2]  Kenneth A. Hawick,et al.  Parallel graph component labelling with GPUs and CUDA , 2010, Parallel Comput..

[3]  Joseph Manzano,et al.  Optimizing irregular applications for energy and performance on the Tilera many-core architecture , 2015, Conf. Computing Frontiers.

[4]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[5]  David A. Bader,et al.  588 Graph Partitioning and Graph Clustering , 2013 .

[6]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[7]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[8]  Radu Marculescu,et al.  "It's a small world after all": NoC performance optimization via long-range link insertion , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[10]  Christian Staudt,et al.  Engineering High-Performance Community Detection Heuristics for Massive Graphs , 2013, 2013 42nd International Conference on Parallel Processing.

[11]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[12]  George Karypis,et al.  METIS and ParMETIS , 2011, Encyclopedia of Parallel Computing.

[13]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[14]  Partha Pratim Pande,et al.  Enhancing performance of wireless NoCs with distributed MAC protocols , 2015, Sixteenth International Symposium on Quality Electronic Design.

[15]  Anantharaman Kalyanaraman,et al.  Parallel Heuristics for Scalable Community Detection , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[16]  Anantharaman Kalyanaraman,et al.  Scaling graph community detection on the Tilera many-core architecture , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[17]  D. Watts,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2001 .

[18]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[20]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[21]  L. Gwennap ADAPTEVA : MORE FLOPS , LESS WATTS Epiphany Offers Floating-Point Accelerator for Mobile Processors , 2011 .

[22]  Partha Pratim Pande,et al.  High performance and energy efficient wireless NoC-enabled multicore architectures for graph analytics , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[23]  Ankur Narang,et al.  Fast Community Detection Algorithm with GPUs and Multicore Architectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[24]  Nitesh V. Chawla,et al.  Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science , 2011, Stat. Anal. Data Min..

[25]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[26]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[27]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[28]  Nicolai M. Josuttis The C++ Standard Library: A Tutorial and Reference , 2012 .

[29]  Rob H. Bisseling,et al.  A Parallel Approximation Algorithm for the Weighted Maximum Matching Problem , 2007, PPAM.

[30]  Masaru Kitsuregawa,et al.  A Graph Based Approach to Extract a Neighborhood Customer Community for Collaborative Filtering , 2002, DNIS.

[31]  David A. Bader,et al.  Parallel Community Detection for Massive Graphs , 2011, PPAM.

[32]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .

[33]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[35]  Ümit V. Çatalyürek,et al.  Graph coloring algorithms for multi-core and massively multithreaded architectures , 2012, Parallel Comput..

[36]  V A Traag,et al.  Narrow scope for resolution-limit-free community detection. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  David A. Bader,et al.  National Laboratory Lawrence Berkeley National Laboratory Title A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets Permalink , 2009 .

[38]  Hao Lu,et al.  Balanced Coloring for Parallel Computing Applications , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[39]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Partha Pratim Pande,et al.  Design of an Energy-Efficient CMOS-Compatible NoC Architecture with Millimeter-Wave Wireless Interconnects , 2013, IEEE Transactions on Computers.

[41]  P. J. Narayanan,et al.  A fast GPU algorithm for graph connectivity , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[42]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[43]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  R. Connor,et al.  Superalliance of bottlenose dolphins , 1999, Nature.

[45]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[46]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[47]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[48]  Jianlong Zhong,et al.  Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[49]  Andrew S. Grimshaw,et al.  High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..

[50]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[51]  A. Rapoport,et al.  Connectivity of random nets , 1951 .

[52]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[53]  Gagan Agrawal,et al.  Efficient and Simplified Parallel Graph Processing over CPU and MIC , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[54]  George Karypis,et al.  Multi-threaded modularity based graph clustering using the multilevel paradigm , 2015, J. Parallel Distributed Comput..

[55]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  Partha Pratim Pande,et al.  Design Space Exploration for Wireless NoCs Incorporating Irregular Network Routing , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[57]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[58]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[59]  Sameer Kulkarni,et al.  Mitigating the compiler optimization phase-ordering problem using machine learning , 2012, OOPSLA '12.

[60]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  Partha Pratim Pande,et al.  Wireless NoC as Interconnection Backbone for Multicore Chips: Promises and Challenges , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[62]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[63]  N.K. Jha,et al.  Toward Ideal On-Chip Communication Using Express Virtual Channels , 2008, IEEE Micro.

[64]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[65]  Sriram Krishnamoorthy,et al.  A work stealing based approach for enabling scalable optimal sequence homology detection , 2015, J. Parallel Distributed Comput..

[66]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  Yiannis Kompatsiaris,et al.  Community detection in Social Media , 2012, Data Mining and Knowledge Discovery.