TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing

Large-scale graph processing is now a crucial task of many commercial applications, and it is conventionally supported by general-purpose processors. These processors are designed to flexibly support highly diverse workloads with classic techniques such as on-chip cache and dynamic pipelining. Yet, it is difficult for the on-chip cache to exploit irregular data locality in large-scale graph processing, even though there are a few high-degree vertices that are frequently accessed in real-world graphs, it is not efficient to perform regular arithmetic operations via sophisticated dynamic pipelining. In short, general-purpose processors could not be the ideal platforms to graph processing. In this paper, we design a reconfigurable graph processing accelerator, with the purpose of providing an energy-efficient and flexible hardware platform for large-scale graph processing. This accelerator features two main components, i.e., the on-chip storage to exploit the data locality of graph processing, and the reconfigurable functional units to adapt to diversified operations in different graph processing tasks. On a total of 36 practical graph processing tasks, we demonstrate that, on average, our accelerator design achieves 1.58x and 25.56x better performance and energy efficiency, respectively, than the GPU baseline.

[1]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[2]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[3]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Paolo Ienne,et al.  Elastic CGRAs , 2013, FPGA '13.

[5]  Keval Vora,et al.  CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[6]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[7]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[8]  Chao Wang,et al.  SODA: Software defined FPGA based accelerators for big data , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.