LDRD final report : massive multithreading applied to national infrastructure and informatics.

Large relational datasets such as national-scale social networks and power grids present different computational challenges than do physical simulations. Sandia's distributed-memory supercomputers are well suited for solving problems concerning the latter, but not the former. The reason is that problems such as pattern recognition and knowledge discovery on large networks are dominated by memory latency and not by computation. Furthermore, most memory requests in these applications are very small, and when the datasets are large, most requests miss the cache. The result is extremely low utilization. We are unlikely to be able to grow out of this problem with conventional architectures. As the power density of microprocessors has approached that of a nuclear reactor in the past two years, we have seen a leveling of Moores Law. Building larger and larger microprocessor-based supercomputers is not a solution for informatics and network infrastructure problems since the additional processors are utilized to only a tiny fraction of their capacity. An alternative solution is to use the paradigm of massive multithreading with a large shared memory. There is only one instance of this paradigm today: the Cray MTA-2. The proposal team has unique experience with and access to this machine. The XMT, which more » is now being delivered, is a Red Storm machine with up to 8192 multithreaded 'Threadstorm' processors and 128 TB of shared memory. For many years, the XMT will be the only way to address very large graph problems efficiently, and future generations of supercomputers will include multithreaded processors. Roughly 10 MTA processor can process a simple short paths problem in the time taken by the Gordon Bell Prize-nominated distributed memory code on 32,000 processors of Blue Gene/Light. We have developed algorithms and open-source software for the XMT, and have modified that software to run some of these algorithms on other multithreaded platforms such as the Sun Niagara and Opteron multi-core chips. « less

[1]  Jesper Larsson Träff,et al.  An Experimental Comparison of two Distributed Single-Source Shortest Path Algorithms , 1995, Parallel Comput..

[2]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  L. Amaral,et al.  The web of human sexual contacts , 2001, Nature.

[4]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[5]  M. Lanzagorta,et al.  Early Experience with Scientific Programs on the Cray MTA-2 , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[6]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[7]  Jonathan W. Berry,et al.  Graph Analysis with High-Performance Computing , 2008, Computing in Science & Engineering.

[8]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[9]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[10]  George L. Nemhauser,et al.  The uncapacitated facility location problem , 1990 .

[11]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[12]  Rajeev Raman,et al.  Recent results on the single-source shortest paths problem , 1997, SIGA.

[13]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[14]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[15]  Robert E. Tarjan,et al.  Relaxed heaps: an alternative to Fibonacci heaps with applications to parallel computation , 1988, CACM.

[16]  Weixiong Zhang,et al.  Identifying network communities with a high resolution. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[18]  Ulrich Meyer,et al.  A computational study of external-memory BFS algorithms , 2006, SODA '06.

[19]  Jonathan W. Berry,et al.  Community Detection via Facility Location , 2007, 0710.3800.

[20]  Andrew V. Goldberg,et al.  Shortest Path Algorithms: Engineering Aspects , 2001, ISAAC.

[21]  D. Boyce,et al.  Reducing the Idle Time of Parallel Shortest Path , 1998 .

[22]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[23]  Robert B. Dial,et al.  Algorithm 360: shortest-path forest with topological ordering [H] , 1969, CACM.

[24]  Nelson Minar,et al.  The Swarm Simulation System: A Toolkit for Building Multi-Agent Simulations , 1996 .

[25]  Marios C. Papaefthymiou,et al.  Implementing parallel shortest-paths algorithms , 1994, Parallel Algorithms.

[26]  Frederick S. Hillier,et al.  Introduction of Operations Research , 1967 .

[27]  Weixiong Zhang,et al.  An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[28]  Douglas Thain,et al.  Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[29]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[30]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[31]  Ulrich Meyer,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[32]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[33]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[34]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[35]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[36]  Yijie Han,et al.  Efficient parallel algorithms for computing all pair shortest paths in directed graphs , 1992, SPAA '92.

[37]  K. Mani Chandy,et al.  Distributed computation on graphs: shortest path algorithms , 1982, CACM.

[38]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[39]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[40]  Jesper Larsson Träff,et al.  A Parallel Priority Queue with Constant Time Operations , 1998, J. Parallel Distributed Comput..

[41]  Andrew Lumsdaine,et al.  Lifting sequential graph algorithms for distributed-memory parallel computation , 2005, OOPSLA '05.

[42]  Ulrich Meyer,et al.  Delta-Stepping: A Parallel Single Source Shortest Path Algorithm , 1998, ESA.

[43]  R. Guimerà,et al.  The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Ulrich Meyer,et al.  Average-case complexity of single-source shortest-paths algorithms: lower and upper bounds , 2003, J. Algorithms.

[45]  Jim Law,et al.  Review of "The boost graph library: user guide and reference manual by Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine." Addison-Wesley 2002. , 2003, SOEN.

[46]  F. Benjamin Zhan,et al.  Shortest Path Algorithms: An Evaluation Using Real Road Networks , 1998, Transp. Sci..

[47]  Thomas H. Spencer,et al.  Time-Work Tradeoffs of the Single-Source Shortest Paths Problem , 1999, J. Algorithms.

[48]  Michael J. Vilot,et al.  Standard template library , 1996 .

[49]  David E. Boyce,et al.  Performance Study of Parallel Shortest Path Algorithms: Characteristics of Good Decompositions , 1997 .

[50]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  David A. Bader,et al.  Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[52]  Ying Fan,et al.  The effect of weight on community structure of networks , 2006, physics/0609218.

[53]  Alexander A. Stepanov,et al.  C++ Standard Template Library , 2000 .

[54]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[55]  G. Caldarelli,et al.  Detecting communities in large networks , 2004, cond-mat/0402499.

[56]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[57]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[58]  Torben Hagerup,et al.  Improved Shortest Paths on the Word RAM , 2000, ICALP.

[59]  Keith D. Underwood,et al.  Analyzing the Scalability of Graph Algorithms on Eldorado , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[60]  R. Levinson PATTERN ASSOCIATIVITY AND THE RETRIEVAL OF SEMANTIC NETWORKS , 1991 .

[61]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[62]  Mihalis Yannakakis,et al.  High-Probability Parallel Transitive-Closure Algorithms , 1991, SIAM J. Comput..

[63]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[64]  Ulrich Meyer,et al.  Design and analysis of sequential and parallel single-source shortest-paths algorithms , 2002 .

[65]  F. Chung,et al.  The small world phenomenon in hybrid graphs , 2006 .

[66]  David A. Bader,et al.  Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture , 2001, WAE.

[67]  Sergio Gómez,et al.  Multiple resolution of the modular structure of complex networks , 2007, ArXiv.

[68]  Giorgio Gallo,et al.  Shortest path algorithms , 1988, Handbook of Optimization in Telecommunications.

[69]  Linda Torczon,et al.  An efficient representation for sparse sets , 1993, LOPL.

[70]  Michael L. Fredman,et al.  Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths , 1994, J. Comput. Syst. Sci..

[71]  David A. Bader,et al.  Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[72]  Andrew V. Goldberg,et al.  Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.

[73]  M. D. Wilkinson,et al.  Management science , 1989, British Dental Journal.

[74]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[75]  Peter Sanders,et al.  Computing Many-to-Many Shortest Paths Using Highway Hierarchies , 2007, ALENEX.

[76]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[78]  David A. Bader,et al.  Parallel Shortest Path Algorithms for Solving Large-Scale Instances , 2006, The Shortest Path Problem.

[79]  Fred W. Glover,et al.  Computational study of an improved shortest path algorithm , 1984, Networks.

[80]  Luonan Chen,et al.  Quantitative function for community detection. , 2008 .

[81]  Pavel Zakharov Diffusion approach for community discovering within the complex networks: LiveJournal study , 2007 .

[82]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.

[83]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[84]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[85]  David A. Bader,et al.  An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances , 2007, ALENEX.

[86]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[87]  Seth Pettie,et al.  Computing shortest paths with comparisons and additions , 2002, SODA '02.

[88]  Jean-Cédric Chappelier,et al.  Finding instabilities in the community structure of complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[89]  Ulrich Meyer,et al.  Buckets strike back: improved parallel shortest-paths , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[90]  David A. Bader,et al.  On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[91]  David E. Boyce,et al.  Termination Detection for Parallel Shortest Path Algorithms , 1998, J. Parallel Distributed Comput..

[92]  F. Glover,et al.  A computational analysis of alternative algorithms and labeling techniques for finding shortest path trees , 1979, Networks.

[93]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[94]  Philip N. Klein,et al.  A Randomized Parallel Algorithm for Single-Source Shortest Paths , 1997, J. Algorithms.

[95]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[96]  Jonathan W. Berry,et al.  Tolerating the community detection resolution limit with edge weighting. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[97]  Evan Tick,et al.  Greedy partitioned algorithms for the shortest-path problem , 2005, International Journal of Parallel Programming.

[98]  Ulrich Meyer,et al.  Heaps Are Better than Buckets: Parallel Shortest Paths on Unbalanced Graphs , 2001, Euro-Par.

[99]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.

[100]  Christos Faloutsos,et al.  Scalable modeling of real graphs using Kronecker multiplication , 2007, ICML '07.

[101]  Mikkel Thorup,et al.  Undirected single-source shortest paths with positive integer weights in linear time , 1999, JACM.

[102]  Jonathan W. Berry,et al.  The MultiThreaded Graph Library (MTGL) , 2008 .

[103]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[104]  Edith Cohen,et al.  Using selective path-doubling for parallel shortest-path computations , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[105]  Peter Sanders,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[106]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[107]  Peter Sanders,et al.  In Transit to Constant Time Shortest-Path Queries in Road Networks , 2007, ALENEX.

[108]  David Easley,et al.  Networks, Crowds, and Markets: The Small-World Phenomenon , 2010 .

[109]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[110]  Andrew V. Goldberg,et al.  A Simple Shortest Path Algorithm with Linear Average Time , 2001, ESA.

[111]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[112]  Ulrich Meyer,et al.  Parallel Shortest Path for Arbitrary Graphs , 2000, Euro-Par.

[113]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[114]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.