Improved External Memory BFS Implementation

Breadth first search (BFS) traversal on massive graphs in external memory was considered non-viable until recently, because of the large number of I/Os it incurs. Ajwani et al. [3] showed that the randomized variant of the o(n) I/O algorithm of Mehlhorn and Meyer [24] (MM_BFS) can compute the BFS level decomposition for large graphs (around a billion edges) in a few hours for small diameter graphs and a few days for large diameter graphs. We improve upon their implementation of this algorithm by reducing the overhead associated with each BFS level, thereby improving the results for large diameter graphs which are more difficult for BFS traversal in external memory. Also, we present the implementation of the deterministic variant of MM_BFS and show that in most cases, it outperforms the randomized variant. The running time for BFS traversal is further improved with a heuristic that preserves the worst case guarantees of MM_BFS. Together, they reduce the time for BFS on large diameter graphs from days shown in [3] to hours. In particular, on line graphs with random layout on disks, our implementation of the deterministic variant of MM_BFS with the proposed heuristic is more than 75 times faster than the previous best result for the randomized variant of MM_BFS in [3].

[1]  Guy E. Blelloch,et al.  An Experimental Analysis of a Compact Graph Representation , 2004, ALENEX/ANALC.

[2]  Norbert Zeh,et al.  I/O-optimal algorithms for planar graphs using separators , 2002, SODA '02.

[3]  Jop F. Sibeyn,et al.  From parallel to external list ranking , 1997 .

[4]  Gerth Stølting Brodal,et al.  Engineering a Cache-Oblivious Sorting Algorith , 2004, ALENEX/ANALC.

[5]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, Algorithmica.

[6]  Kurt Mehlhorn,et al.  External-Memory Breadth-First Search with Sublinear I/O , 2002, ESA.

[7]  Guy E. Blelloch,et al.  Compact representations of separable graphs , 2003, SODA '03.

[8]  Gerth Stølting Brodal,et al.  Engineering a cache-oblivious sorting algorithm , 2008, JEAL.

[9]  Ulrich Meyer,et al.  A computational study of external-memory BFS algorithms , 2006, SODA '06.

[10]  Torsten Suel,et al.  Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[12]  Peter Sanders,et al.  Engineering an External Memory Minimum Spanning Tree Algorithm , 2004, IFIP TCS.

[13]  Lars Arge,et al.  On external-memory MST, SSSP and multi-way planar graph separation , 2000, J. Algorithms.

[14]  Norbert Zeh,et al.  I/O-Efficient Algorithms for Graphs of Bounded Treewidth , 2001, SODA '01.

[15]  Gerth Stølting Brodal,et al.  Cache Oblivious Distribution Sweeping , 2002, ICALP.

[16]  Andrew V. Goldberg,et al.  Computing Point-to-Point Shortest Paths from External Memory , 2005, ALENEX/ANALCO.

[17]  Norbert Zeh,et al.  External Memory Algorithms for Outerplanar Graphs , 1999, ISAAC.

[18]  Suresh Venkatasubramanian,et al.  On external memory graph traversal , 2000, SODA '00.

[19]  Jeffrey Scott Vitter,et al.  I/O-Efficient Algorithms for Problems on Grid-Based Terrains , 2001, JEAL.

[20]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[21]  Kamesh Munagala,et al.  I/O-complexity of graph algorithms , 1999, SODA '99.

[22]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[23]  Marc Najork,et al.  Breadth-First Search Crawling Yields High-Quality Pages , 2001 .

[24]  Ulrich Meyer,et al.  Cache-Oblivious Data Structures and Algorithms for Undirected Breadth-First Search and Shortest Paths , 2004, SWAT.