Ligra: a lightweight graph processing framework for shared memory

There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts. In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

[1]  Algorithm 235: Random permutation , 1964, CACM.

[2]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[3]  J. Wrench Table errata: The art of computer programming, Vol. 2: Seminumerical algorithms (Addison-Wesley, Reading, Mass., 1969) by Donald E. Knuth , 1970 .

[4]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[5]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[6]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[7]  Alon Itai,et al.  Finding a minimum circuit in a graph , 1977, STOC '77.

[8]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[9]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[10]  Jean Vuillemin,et al.  A unifying look at data structures , 1980, CACM.

[11]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[12]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[13]  Francis Y. L. Chin,et al.  Efficient parallel algorithms for some graph problems , 1982, CACM.

[14]  S. N. Maheshwari,et al.  Parallel Algorithms for the Connected Components and Minimal Spanning Tree Problems , 1982, Inf. Process. Lett..

[15]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[16]  Uzi Vishkin,et al.  An optimal parallel connectivity algorithm , 1984, Discret. Appl. Math..

[17]  Larry Rudolph,et al.  Dynamic decentralized cache schemes for mimd parallel processors , 1984, ISCA 1984.

[18]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[19]  Richard M. Karp,et al.  A fast parallel algorithm for the maximal independent set problem , 1985, JACM.

[20]  Václav Koubek,et al.  Parallel algorithms for connected components in a graph , 1985, FCT.

[21]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[22]  Stephen A. Cook,et al.  A Taxonomy of Problems with Fast Parallel Algorithms , 1985, Inf. Control..

[23]  Gary L. Miller,et al.  Parallel tree contraction and its application , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[24]  Michael Luby A Simple Parallel Algorithm for the Maximal Independent Set Problem , 1986, SIAM J. Comput..

[25]  Amos Israeli,et al.  An Improved Parallel Algorithm for Maximal Matching , 1986, Inf. Process. Lett..

[26]  Luc Devroye,et al.  A note on the height of binary search trees , 1986, JACM.

[27]  Ajit Agrawal,et al.  A Parallel O(log N) Algorithm for Finding Connected Components In Planar Images , 1987, ICPP.

[28]  Carla Schlatter Ellis,et al.  Concurrency in linear hashing , 1987, TODS.

[29]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[30]  Susanne E. Hambrusch,et al.  A Study of Connected Component Labeling Algorithms on the MPP , 1988 .

[31]  William E. Weihl,et al.  Commutativity-based concurrency control for abstract data types , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[32]  David A. Padua,et al.  Automatic detection of nondeterminacy in parallel programs , 1988, PADD '88.

[33]  Guy Joseph Jacobson,et al.  Succinct static data structures , 1988 .

[34]  Guy L. Steele,et al.  Making asynchronous parallelism safe for the world , 1989, POPL '90.

[35]  Prabhakar Raghavan,et al.  Parallel Graph Algorithms That Are Efficient on Average , 1989, Inf. Comput..

[36]  Torben Hagerup,et al.  Optimal Merging and Sorting on the Erew Pram , 1989, Inf. Process. Lett..

[37]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[38]  Naomi Nishimura,et al.  Asynchronous shared memory parallel computation , 1990, SPAA '90.

[39]  Vijay Kumar,et al.  Concurrent operations on extendible hashing and its performance , 1990, CACM.

[40]  HennessyJohn,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990 .

[41]  V AdveSarita,et al.  Weak orderinga new definition , 1990 .

[42]  Richard J. Anderson Parallel algorithms for generating random permutations on a shared memory machine , 1990, SPAA '90.

[43]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[44]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[45]  Noam Nisan,et al.  Pseudorandom generators for space-bounded computations , 1990, STOC '90.

[46]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[47]  Alan M. Frieze,et al.  Probabilistic Analysis of a Parallel Algorithm for Finding Maximal Independent Sets , 1990, Random Struct. Algorithms.

[48]  Rajeev Raman,et al.  The Power of Collision: Randomized Parallel Algorithms for Chaining and Integer Sorting , 1990, FSTTCS.

[49]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[50]  Uzi Vishkin,et al.  On Parallel Hashing and Integer Sorting , 1991, J. Algorithms.

[51]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[52]  An Optimal Randomized Parallel Algorithm for Finding Connected Components in a Graph , 1991, SIAM J. Comput..

[53]  Uzi Vishkin,et al.  Towards a theory of nearly constant time parallel algorithms , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[54]  Wojciech Rytter,et al.  Efficient Parallel Algorithms to Test Square-Freeness and Factorize Strings , 1991, Inf. Process. Lett..

[55]  Richard Cole,et al.  Approximate Parallel Scheduling. II. Applications to Logarithmic-Time Optimal Parallel Graph Algorithms , 1991, Inf. Comput..

[56]  Gary L. Miller,et al.  Parallel Tree Contraction, Part 2: Further Applications , 1991, SIAM J. Comput..

[57]  Rajeev Raman,et al.  Waste makes haste: tight bounds for loose parallel sorting , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[58]  Noam Nisan,et al.  Pseudorandom generators for space-bounded computation , 1992, Comb..

[59]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[60]  Tak Wah Lam,et al.  Finding connected components in O(log n loglog n) time on the EREW PRAM , 1993, SODA '93.

[61]  N Linial,et al.  Low diameter graph decompositions , 1993, Comb..

[62]  Yossi Matias,et al.  Efficient low-contention parallel algorithms , 1994, SPAA '94.

[63]  Sergio De Agostino P-complete Problems in Data Compression , 1994, Theor. Comput. Sci..

[64]  Yahiko Kambayashi,et al.  A Simpler Parallel Algorithm for Graph Conectivity , 1994, J. Algorithms.

[65]  Uzi Vishkin,et al.  Symmetry breaking for suffix tree construction , 1994, STOC '94.

[66]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[67]  John Greiner,et al.  A comparison of parallel algorithms for connected components , 1994, SPAA '94.

[68]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[69]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[70]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[71]  Nir Shavit,et al.  Diffracting trees , 1996, TOCS.

[72]  Richard Cole,et al.  Finding minimum spanning forests in logarithmic time and linear work using random sampling , 1996, SPAA '96.

[73]  David A. Bader,et al.  Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study , 1996, J. Parallel Distributed Comput..

[74]  S. Muthukrishnan,et al.  Optimal Logarithmic Time Randomized Suffix Tree Construction , 1996, ICALP.

[75]  Guy E. Blelloch,et al.  A provable time and space efficient implementation of NESL , 1996, ICFP '96.

[76]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[77]  Uri Zwick,et al.  An Optimal Randomised Logarithmic Time Connectivity Algorithm for the EREW PRAM , 1996, J. Comput. Syst. Sci..

[78]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.

[79]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[80]  Jop F. Sibeyn,et al.  Better trade-offs for parallel list ranking , 1997, SPAA '97.

[81]  Chung Keung Poon,et al.  A Randomized Linear-Work EREW PRAM Algorithm to Find a Minimum Spanning Forest , 1997, Algorithmica.

[82]  Ashfaq A. Khokhar,et al.  Scalable Parallel Implementations of List Ranking on Fine-Grained Machines , 1997, IEEE Trans. Parallel Distributed Syst..

[83]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[84]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[85]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[86]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[87]  Peter Sanders,et al.  Random Permutations on Distributed, External and Hierarchical Memory , 1998, Inf. Process. Lett..

[88]  Yossi Matias,et al.  The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms , 1999, SIAM J. Comput..

[89]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[90]  David Richard Clark,et al.  Compact pat trees , 1998 .

[91]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[92]  Thomas M. Liebling,et al.  Parallel computation of the diameter of a graph , 1998 .

[93]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[94]  Yossi Matias,et al.  The Queue-Read Queue-Write Asynchronous PRAM Model , 1998, Theor. Comput. Sci..

[95]  Charles E. Leiserson,et al.  Detecting data races in Cilk programs that use locks , 1998, SPAA '98.

[96]  KarypisGeorge,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998 .

[97]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[98]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[99]  C. Leiserson,et al.  Scheduling multithreaded computations by work stealing , 1999, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[100]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[101]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[102]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[103]  Noam Nisan,et al.  Fast Connected Components Algorithms for the EREW PRAM , 1999, SIAM J. Comput..

[104]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[105]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[106]  Guy E. Blelloch,et al.  The Data Locality of Work Stealing , 2002, SPAA '00.

[107]  Nir Shavit,et al.  Reactive Diffracting Trees , 2000, J. Parallel Distributed Comput..

[108]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[109]  Feodor F. Dragan,et al.  Diameter determination on restricted graph families , 1998, Discret. Appl. Math..

[110]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[111]  Moni Naor,et al.  Anti-persistence: history independent data structures , 2001, STOC '01.

[112]  Joseph JáJá,et al.  Prefix Computations on Symmetric Multiprocessors , 2001, J. Parallel Distributed Comput..

[113]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[114]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[115]  Seth Pettie,et al.  A Randomized Time-Work Optimal Parallel Algorithm for Finding a Minimum Spanning Forest , 1999, RANDOM-APPROX.

[116]  Kunihiko Sadakane,et al.  Succinct representations of lcp information and improvements in the compressed suffix arrays , 2002, SODA '02.

[117]  Michael B. Greenwald,et al.  Two-handed emulation: how to build non-blocking implementations of complex data-structures using DCAS , 2002, PODC '02.

[118]  David A. Bader,et al.  Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs) (Extended Abstract) , 2002, HiPC.

[119]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[120]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[121]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[122]  Peter Sanders,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[123]  Jignesh M. Patel,et al.  OASIS: An Online and Accurate Technique for Local-alignment Searches on Biological Sequences , 2003, VLDB.

[124]  Jens Gustedt,et al.  Randomized permutations in a coarse grained parallel environment , 2003, SPAA '03.

[125]  Guy E. Blelloch,et al.  Scalable Room Synchronizations , 2003, Theory of Computing Systems.

[126]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[127]  Robert Giegerich,et al.  Efficient implementation of lazy suffix trees , 2003, Softw. Pract. Exp..

[128]  Ulrich Meyer,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[129]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[130]  Hui Gao,et al.  Lock-free dynamic hash tables with open addressing , 2003, Distributed Computing.

[131]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[132]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[133]  Jens Stoye,et al.  Linear time algorithms for finding and representing all the tandem repeats in a string , 2004, J. Comput. Syst. Sci..

[134]  Yair Bartal Graph Decomposition Lemmas and Their Role in Metric Embedding Methods , 2004, ESA.

[135]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[136]  Roberto Grossi,et al.  When indexing equals compression: experiments with compressing suffix arrays and applications , 2004, SODA '04.

[137]  Kellogg S. Booth,et al.  Heuristics for ray tracing using space subdivision , 1990, The Visual Computer.

[138]  David A. Bader,et al.  An Empirical Analysis of Parallel Random Permutation Algorithms ON SMPs , 2006, PDCS.

[139]  David A. Bader,et al.  On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[140]  Gad M. Landau,et al.  Parallel construction of a suffix tree with applications , 1988, Algorithmica.

[141]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[142]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[143]  David A. Bader,et al.  Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors , 2005, HiPC.

[144]  Timothy L. Harris,et al.  Non-blocking Hashtables with Open Addressing , 2005, DISC.

[145]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[146]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[147]  Larry Rudolph,et al.  Efficient parallel algorithms for graph problems , 1990, Algorithmica.

[148]  Nir Shavit,et al.  Split-ordered lists: Lock-free extensible hash tables , 2006, JACM.

[149]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[150]  Volker Heun,et al.  Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE , 2006, CPM.

[151]  Christian Sohler,et al.  Counting triangles in data streams , 2006, PODS.

[152]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[153]  Mohammed J. Zaki,et al.  Genome-scale disk-based suffix tree indexing , 2007, SIGMOD '07.

[154]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[155]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..

[156]  David A. Bader,et al.  SWARM: A Parallel Programming Framework for Multicore Processors , 2007, IPDPS.

[157]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[158]  D. Golovin,et al.  Linear Equations Modulo 2 and the L1 Diameter of Convex Bodies , 2007, FOCS 2007.

[159]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[160]  Scalable parallel suffix array construction , 2007, Parallel Comput..

[161]  David A. Bader,et al.  An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances , 2007, ALENEX.

[162]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[163]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[164]  Frank Dehne,et al.  Randomized parallel list ranking for distributed memory multiprocessors , 2007, International Journal of Parallel Programming.

[165]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[166]  Guy E. Blelloch,et al.  Combinable memory-block transactions , 2008, SPAA '08.

[167]  Gang Chen,et al.  Lempel–Ziv Factorization Using Less Time & Space , 2008, Math. Comput. Sci..

[168]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[169]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[170]  Maurice Herlihy,et al.  Hopscotch Hashing , 2008, DISC.

[171]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[172]  Mohammed J. Zaki,et al.  TRELLIS+: An Effective Approach for Indexing Genome-Scale Sequences Using Suffix Trees , 2008, Pacific Symposium on Biocomputing.

[173]  Peter Sanders,et al.  Better Approximation of Betweenness Centrality , 2008, ALENEX.

[174]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[175]  Lucian Ilie,et al.  A Simple Algorithm for Computing the Lempel Ziv Factorization , 2008, Data Compression Conference (dcc 2008).

[176]  John R. Gilbert,et al.  A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.

[177]  Simon J. Puglisi,et al.  Space-Time Tradeoffs for Longest-Common-Prefix Array Computation , 2008, ISAAC.

[178]  Wojciech Rytter,et al.  LPF Computation Revisited , 2009, IWOCA.

[179]  Brandon Lucia,et al.  DMP: deterministic shared memory multiprocessing , 2009, IEEE Micro.

[180]  P. J. Narayanan,et al.  Fast and scalable list ranking on the GPU , 2009, ICS.

[181]  Matthieu Latapy,et al.  Fast computation of empirically tight bounds for the diameter of massive graphs , 2009, JEAL.

[182]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[183]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[184]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[185]  Sebastiano Vigna,et al.  Permuting Web Graphs , 2009, WAW.

[186]  Ge Nong,et al.  Linear Suffix Array Construction by Almost Pure Induced-Sorting , 2009, 2009 Data Compression Conference.

[187]  Juha Kärkkäinen,et al.  Permuted Longest-Common-Prefix Array , 2009, CPM.

[188]  Alberto Apostolico,et al.  Graph Compression by BFS , 2009, Algorithms.

[189]  Kumar Chellapilla,et al.  Speeding up algorithms on compressed web graphs , 2009, WSDM '09.

[190]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[191]  Managing contention for shared resources on multicore processors , 2010, Commun. ACM.

[192]  John C. Hart,et al.  Parallel SAH k-D tree construction , 2010, HPG '10.

[193]  Christos Faloutsos,et al.  Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.

[194]  Charles E. Leiserson,et al.  A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.

[195]  H. Avron Counting Triangles in Large Graphs using Randomized Matrix Trace Estimation , 2010 .

[196]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[197]  Edson Cáceres,et al.  Experimental results of a coarse-grained parallel algorithm for spanning tree and connected components , 2010, HPCS.

[198]  Nick Koudas,et al.  Suffix tree construction algorithms on modern hardware , 2010, EDBT '10.

[199]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[200]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[201]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[202]  David A. Bader,et al.  Massive streaming data analytics: A case study with clustering coefficients , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[203]  Charalampos E. Tsourakakis Counting triangles in real-world networks using projections , 2011, Knowledge and Information Systems.

[204]  Jouni Sirén Sampled Longest Common Prefix Array , 2010, CPM.

[205]  Junfeng Yang,et al.  Stable Deterministic Multithreading through Schedule Memoization , 2010, OSDI.

[206]  Christos Faloutsos,et al.  PEGASUS: mining peta-scale graphs , 2011, Knowledge and Information Systems.

[207]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[208]  Guy E. Blelloch,et al.  Low depth cache-oblivious algorithms , 2010, SPAA '10.

[209]  P. J. Narayanan,et al.  A fast GPU algorithm for graph connectivity , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[210]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS 2010.

[211]  Luis Ceze,et al.  Deterministic Process Groups in dOS , 2010, OSDI.

[212]  Nir Shavit,et al.  Flat combining and the synchronization-parallelism tradeoff , 2010, SPAA '10.

[213]  Nectarios Koziris,et al.  Exploiting compression opportunities to improve SpMxV performance on shared memory systems , 2010, TACO.

[214]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[215]  Roberto Grossi,et al.  Wavelet Trees: From Theory to Practice , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[216]  Vivek Sarkar,et al.  The design and implementation of the habanero-java parallel programming language , 2011, OOPSLA Companion.

[217]  Sergio De Agostino,et al.  Lempel-Ziv Data Compression on Parallel and Distributed Systems , 2011, CCP.

[218]  D. Patterson,et al.  Searching for a Parent Instead of Fighting Over Children : A Fast Breadth-First Search Implementation for Graph 500 , 2011 .

[219]  Dan Grossman,et al.  RCDC: a relaxed consistency deterministic computer , 2011, ASPLOS XVI.

[220]  Patrick K. Nicholson,et al.  Space Efficient Wavelet Tree Construction , 2011, SPIRE.

[221]  Jonathan Walpole,et al.  Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming , 2011, USENIX ATC.

[222]  Junfeng Yang,et al.  Efficient deterministic multithreading through schedule relaxation , 2011, SOSP.

[223]  Kishore Kothapalli,et al.  Hybrid algorithms for list ranking and graph connected components , 2011, 2011 18th International Conference on High Performance Computing.

[224]  Enno Ohlebusch,et al.  Fast and Lightweight LCP-Array Construction Algorithms , 2011, ALENEX.

[225]  Samuel Williams,et al.  Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[226]  Ilya Safro,et al.  Multiscale approach for the network compression-friendly ordering , 2010, J. Discrete Algorithms.

[227]  A Parallel Compact Hash Table , 2011, MEMICS.

[228]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[229]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[230]  Keshav Pingali,et al.  Exploiting the commutativity lattice , 2011, PLDI '11.

[231]  Guy E. Blelloch,et al.  Scheduling irregular parallel computations on hierarchical caches , 2011, SPAA '11.

[232]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[233]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[234]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[235]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[236]  Panos Kalnis,et al.  ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings , 2011, Proc. VLDB Endow..

[237]  Nectarios Koziris,et al.  CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.

[238]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[239]  Jimeng Sun,et al.  gbase: an efficient analysis platform for large graphs , 2012, The VLDB Journal.

[240]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[241]  James Cheng,et al.  Triangle listing in massive networks , 2012, TKDD.

[242]  Mauricio Marín,et al.  Distributed search based on self-indexed compressed text , 2012, Inf. Process. Manag..

[243]  Guy E. Blelloch,et al.  Parallel and I/O efficient set covering algorithms , 2012, SPAA '12.

[244]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[245]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[246]  Panagiota Fatourou,et al.  Revisiting the combining synchronization technique , 2012, PPoPP '12.

[247]  Guy E. Blelloch,et al.  Greedy sequential maximal independent set and matching are parallel on average , 2012, SPAA '12.

[248]  Ming Wu,et al.  Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.

[249]  Christos Makris,et al.  Wavelet trees: A survey , 2012, Comput. Sci. Inf. Syst..

[250]  Charles E. Leiserson,et al.  Deterministic parallel random-number generation for dynamic-multithreading platforms , 2012, PPoPP '12.

[251]  David A. Patterson,et al.  Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[252]  Md. Mostofa Ali Patwary,et al.  Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[253]  Guy E. Blelloch,et al.  Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.

[254]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[255]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[256]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[257]  John R. Gilbert,et al.  A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis , 2012, SDM.

[258]  Sen Hu,et al.  Efficient system-enforced deterministic parallelism , 2010, OSDI.

[259]  Simon J. Puglisi,et al.  Lempel-Ziv factorization: Simple, fast, practical , 2013, ALENEX.

[260]  Enno Ohlebusch,et al.  Computing the longest common prefix array based on the Burrows-Wheeler transform , 2013, J. Discrete Algorithms.

[261]  Gary L. Miller,et al.  Parallel graph decompositions using random shifts , 2013, SPAA.

[262]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[263]  Guy E. Blelloch,et al.  Program-centric cost models for locality , 2013, MSPC '13.

[264]  Harsha Vardhan Simhadri,et al.  Program-Centric Cost Models for Locality and Parallelism , 2013 .

[265]  Julian Shun,et al.  Practical Parallel Lempel-Ziv Factorization , 2013, 2013 Data Compression Conference.

[266]  Ge Yu,et al.  Parallel Triangle Counting over Large Graphs , 2013, DASFAA.

[267]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[268]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[269]  Srikanta Tirthapura,et al.  Parallel triangle counting in massive streaming graphs , 2013, CIKM.

[270]  Hao Yuan,et al.  A Faster CREW PRAM Algorithm for Computing Cartesian Trees , 2013, CIAC.

[271]  Junfeng Yang,et al.  Parrot: a practical runtime for deterministic, stable, and reliable threads , 2013, SOSP.

[272]  Tamara G. Kolda,et al.  The importance of directed triangles with reciprocity: patterns and algorithms , 2013, ArXiv.

[273]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[274]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[275]  Uzi Vishkin,et al.  Parallel algorithms for Burrows-Wheeler compression and decompression , 2014, Theor. Comput. Sci..

[276]  Nhan Nguyen,et al.  Lock-Free Cuckoo Hashing , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[277]  Tamara G. Kolda,et al.  Counting Triangles in Massive Graphs with MapReduce , 2013, SIAM J. Sci. Comput..

[278]  Keval Vora,et al.  CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[279]  Christos Faloutsos,et al.  SlashBurn: Graph Compression and Mining beyond Caveman Communities , 2014, IEEE Transactions on Knowledge and Data Engineering.

[280]  Keshav Pingali,et al.  Deterministic galois: on-demand, portable and parameterless , 2014, ASPLOS.

[281]  Rasmus Pagh,et al.  Triangle Counting in Dynamic Graph Streams , 2014, SWAT.

[282]  Ryan Newton,et al.  Freeze after writing: quasi-deterministic parallel programming with LVars , 2014, POPL.

[283]  Juha Kärkkäinen,et al.  LCP Array Construction in External Memory , 2014, SEA.

[284]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[285]  Ryan Newton,et al.  Taming the parallel effect zoo: extensible deterministic parallelism with LVish , 2014, PLDI.

[286]  Lluís-Miquel Munguía,et al.  Fast triangle counting on the GPU , 2014, IA3 '14.

[287]  Dimitar Dimitrov,et al.  Commutativity race detection , 2014, PLDI.

[288]  Jianlong Zhong,et al.  Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[289]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[290]  Guy E. Blelloch,et al.  Sequential Random Permutation, List Contraction and Tree Contraction are Highly Parallel , 2015, SODA.

[291]  Maxim A. Babenko,et al.  Wavelet Trees Meet Suffix Trees , 2015, SODA.