Crunching Large Graphs with Commodity Processors

Crunching large graphs is the basis of many emerging applications, such as social network analysis and bioinformatics. Graph analytics algorithms exhibit little locality and therefore present significant performance challenges. Hardware multithreading systems (e.g., Cray XMT) show that with enough concurrency, we can tolerate long latencies. Unfortunately, this solution is not available with commodity parts. Our goal is to develop a latency-tolerant system built out of commodity parts and mostly in software. The proposed system includes a runtime that supports a large number of lightweight contexts, full-bit synchronization and a memory manager that provides a high-latency but high-bandwidth global shared memory. This paper lays out the vision for our system and justifies its feasibility with a performance analysis of the run-time for latency tolerance.

[1]  Mateo Valero,et al.  Proceedings of the 2nd conference on Computing frontiers , 2005, CF 2008.

[2]  Afonso Ferreira,et al.  Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP , 2002, Algorithmica.

[3]  Robert J. Fowler,et al.  Modeling memory concurrency for multi-socket multi-core systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[4]  John Feo,et al.  High performance semantic factoring of giga-scale semantic graph databases. , 2010 .

[5]  Andy B. Yoo,et al.  MSSG: A Framework for Massive-Scale Semantic Graphs , 2006, 2006 IEEE International Conference on Cluster Computing.

[6]  Wei Huang,et al.  Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand , 2008, 2008 16th IEEE Symposium on High Performance Interconnects.

[7]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[8]  Tina Eliassi-Rad,et al.  Data Sciences Technology for Homeland Security Information Management and Knowledge Discovery , 2005 .

[9]  Douglas Thain,et al.  Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[10]  Mike Houston,et al.  A closer look at GPUs , 2008, Commun. ACM.

[11]  Evangelos P. Markatos,et al.  First-class user-level threads , 1991, SOSP '91.

[12]  José E. Moreira,et al.  Dissecting Cyclops: a detailed analysis of a multithreaded architecture , 2003, CARN.

[13]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[14]  Michael E. Thomadakis,et al.  The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms , 2011 .

[15]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[17]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[18]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[19]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[20]  Laxmikant V. Kalé,et al.  Multiple flows of control in migratable parallel programs , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).

[21]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[22]  Ricardo Bianchini,et al.  The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[23]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[24]  David A. Bader,et al.  Massive streaming data analytics: A case study with clustering coefficients , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[25]  Katherine Yelick,et al.  UPC: Distributed Shared-Memory Programming , 2003 .

[26]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[27]  George C. Necula,et al.  Capriccio: scalable threads for internet services , 2003, SOSP '03.

[28]  V. S. Subrahmanian,et al.  COSI: Cloud Oriented Subgraph Identification in Massive Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[29]  Todd C. Mowry,et al.  Software-controlled multithreading using informing memory operations , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[30]  Kevin J. Lang Fixing two weaknesses of the Spectral Method , 2005, NIPS.

[31]  Robert J. Fowler,et al.  Multi-threaded library for many-core systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.