论文信息 - Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators

Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators

Domain-specific accelerators for graph analytics leverage a large on-chip memory in order to tackle the intensive random memory accesses, offering higher performance and energy efficiency than conventional architectures. However, limited by the inefficient usage of on-chip memory, current accelerators suffer from energy and performance bottlenecks due to the large amount of off-chip memory accesses. In this work, we introduce an online preprocessing step for the vertex-centric programming model based on our observation of imbalanced memory bandwidth utilization between two execution phases. Our scheme improves energy efficiency and performance by significantly reducing off-chip accesses in two ways. First, we sequence random off-chip memory accesses to balance memory bandwidth demands and improve the utilization of on-chip memory. Second, we prune active leaf vertices to avoid redundant memory accesses. We evaluate our method on a state-of-the-art graph analytics accelerator and achieve 1.6× speedup while reducing energy consumption by 42% on average.

[1] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2] Luiz André Barroso,et al. The Case for Energy-Proportional Computing , 2007, Computer.

[3] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[4] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[5] Yu Wang,et al. HyVE: Hybrid Vertex-Edge Memory Hierarchy for Energy-Efficient Graph Processing , 2019, IEEE Transactions on Computers.

[6] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.

[8] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[9] Zhisong Fu,et al. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs , 2014, GRADES.

[10] L. Takac. DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[11] Jon M. Kleinberg,et al. Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[12] Jure Leskovec,et al. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[13] Yuan Xie,et al. Exploring Core and Cache Hierarchy Bottlenecks in Graph Processing Workloads , 2018, IEEE Computer Architecture Letters.

[14] Pradeep Dubey,et al. GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[15] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.