TorusBFS : A Novel Message-passing Parallel Breadth-First Search Architecture on FPGAs

Graphs are a fundamental data structure used extensively in numerous domains. In graph-based applications, Breadth-First Search (BFS) is a key component which suffers from long latency of memory accesses. In this paper, we present a novel message passing parallel BFS architecture namely TorusBFS on field-programmable gate array (FPGA). By utilizing the on-chip memories to store the visitation status of vertices and to implement the current/next queue, our architecture reduces the accesses to the off-chip memories. We also present a on-chip 2-D torus message passing structure to reduce latencies of exchanging information among processing elements (PEs). Limited to the inefficient random write accesses to the off-chip memories, the experimental results show that our architecture on a single FPGA achieves relative lower performance compared with related works based on Convey HC1/HC-2 platforms. Nevertheless, our TorusBFS is the first architecture that can be easily extended to multiple FPGAs in a distributed environment. Keywords-Breadth-First Search; Graph500; FPGA; Message Passing

[1]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[2]  Pradeep Dubey,et al.  Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[3]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Ümit V. Çatalyürek,et al.  An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[5]  Yinglong Xia TOPOLOGICALLY ADAPTIVE PARALLEL BREADTH-FIRST SEARCH ON MULTICORE PROCESSORS , 2010 .

[6]  Yong Dou,et al.  Direction-Optimizing Breadth-First Search on CPU-GPU Heterogeneous Platforms , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[7]  Yong Dou,et al.  Parallel graph traversal for FPGA , 2014, IEICE Electron. Express.

[8]  Yu Wang,et al.  A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[9]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[10]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[11]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[12]  David A. Bader,et al.  Massive Social Network Analysis: Mining Twitter for Social Good , 2010, 2010 39th International Conference on Parallel Processing.

[13]  Phillip H. Jones,et al.  CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[14]  Marco Alvarez Vega,et al.  Graph Kernels and Applications in Bioinformatics , 2011 .

[15]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[16]  Tao Gao,et al.  Using the Intel Many Integrated Core to accelerate graph traversal , 2014, Int. J. High Perform. Comput. Appl..

[17]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..