Improving Streaming Graph Processing Performance using Input Knowledge

Streaming graphs are ubiquitous in today’s big data era. Prior work has improved the performance of streaming graph workloads without taking input characteristics into account. In this work, we demonstrate that input knowledge-driven software and hardware co-design is critical to optimize the performance of streaming graph processing. To improve graph update efficiency, we first characterize the performance trade-offs of input-oblivious batch reordering. Guided by our findings, we propose input-aware batch reordering to adaptively reorder input batches based on their degree distributions. To complement adaptive batch reordering, we propose updating graphs dynamically, based on their input characteristics, either in software (via update search coalescing) or in hardware (via acceleration support). To improve graph computation efficiency, we present input-aware work aggregation which adaptively modulates the computation granularity based on inter-batch locality characteristics. Evaluated across 260 workloads, our input-aware techniques provide on average 4.55 × and 2.6 × improvement in graph update performance for different input types (on top of eliminating the performance degradation from input-oblivious batch reordering). The graph compute performance is improved by 1.26 × (up to 2.7 ×).

[1]  Wentao Han,et al.  RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s , 2020, SIGMOD Conference.

[2]  Nael Abu-Ghazaleh,et al.  GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Yuan Xie,et al.  SAGA-Bench: Software and Hardware Characterization of Streaming Graph Analytics Workloads , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[4]  James Tuck,et al.  The Case for Domain-Specialized Branch Predictors for Graph-Processing , 2020, IEEE Computer Architecture Letters.

[5]  Boris Grot,et al.  Domain-Specialized Cache Management for Graph Analytics , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[6]  Charles E. Leisersen,et al.  EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs , 2019, AAAI.

[7]  H. Howie Huang,et al.  GraphOne: A Data Store for Real-time Analytics on Evolving Graphs , 2020, FAST.

[8]  Zhimin Zhang,et al.  Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach , 2019, MICRO.

[9]  Yanzhi Wang,et al.  GraphQ: Scalable PIM-Based Graph Processing , 2019, MICRO.

[10]  Nathan Beckmann,et al.  PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates , 2019, MICRO.

[11]  Jose-Maria Arnau,et al.  SCU: A GPU Stream Compaction Unit for Graph Processing , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[12]  Kiran Kumar Matam,et al.  GraphSSD: Graph Semantics Aware SSD , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[13]  Kevin Skadron,et al.  GraphTinker: A High Performance Data Structure for Dynamic Graph Processing , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[14]  Julian Shun,et al.  Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.

[15]  Keval Vora,et al.  GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs , 2019, EuroSys.

[16]  Omer Khan,et al.  HeteroMap: A Runtime Performance Predictor for Efficient Processing of Graph Analytics on Heterogeneous Multi-Accelerators , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[17]  Aaron Clauset,et al.  Scale-free networks are rare , 2018, Nature Communications.

[18]  Jie Yao,et al.  GraPU: Accelerate Streaming Graph Analysis through Preprocessing Buffered Updates , 2018, SoCC.

[19]  Xiaosong Ma,et al.  Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  David A. Bader,et al.  Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[21]  Brandon Lucia,et al.  When is Graph Reordering an Optimization? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[22]  Zhengping Qian,et al.  Real-time Constrained Cycle Detection in Large Dynamic Graphs , 2018, Proc. VLDB Endow..

[23]  Sudipto Guha,et al.  SpotLight: Detecting Anomalies in Streaming Graphs , 2018, KDD.

[24]  Huazhong Yang,et al.  HyVE: Hybrid vertex-edge memory hierarchy for energy-efficient graph processing , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Christoforos E. Kozyrakis,et al.  GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Jure Leskovec,et al.  Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time , 2017, WWW.

[27]  Yiran Chen,et al.  GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28]  Jimmy Lin,et al.  RecService: Distributed Real-Time Graph Processing at Twitter , 2018, HotCloud.

[29]  Omer Khan,et al.  GraphTuner: An Input Dependence Aware Loop Perforation Scheme for Efficient Execution of Approximated Graph Algorithms , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[30]  Bingsheng He,et al.  Accelerating Dynamic Graph Analytics on GPUs , 2017, Proc. VLDB Endow..

[31]  Viktor K. Prasanna,et al.  OSCAR: Optimizing SCrAtchpad reuse for graph processing , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[32]  Shuaiwen Song,et al.  EvoGraph: On-the-Fly Efficient Mining of Evolving Graphs on GPU , 2017, ISC.

[33]  Tianshi Chen,et al.  TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[34]  Rajiv Gupta,et al.  KickStarter , 2017 .

[35]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[36]  Ramyad Hadidi,et al.  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[37]  Rajiv Gupta,et al.  Synergistic Analysis of Evolving Graphs , 2016, ACM Trans. Archit. Code Optim..

[38]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  Omer Khan,et al.  GPU concurrency choices in graph analytics , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[40]  David A. Bader,et al.  cuSTINGER: Supporting dynamic graph algorithms for GPUs , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[41]  Jimmy J. Lin,et al.  GraphJet: Real-Time Content Recommendations at Twitter , 2016, Proc. VLDB Endow..

[42]  Karsten Schwan,et al.  GraphIn: An Online High Performance Incremental Graph Processing Framework , 2016, Euro-Par.

[43]  Ion Stoica,et al.  Time-evolving graph processing at scale , 2016, GRADES '16.

[44]  Ozcan Ozturk,et al.  Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[45]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[46]  Sam Ainsworth,et al.  Graph Prefetching Using Data Structure Knowledge , 2016, ICS.

[47]  Satoshi Matsuoka,et al.  Towards a Distributed Large-Scale Dynamic Graph Data Store , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[48]  Xiao Meng,et al.  DISTINGER: A distributed graph data structure for massive dynamic graph processing , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[49]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[50]  Ion Stoica,et al.  CellIQ : Real-Time Cellular Network Analytics at Scale , 2015, NSDI.

[51]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[52]  Hang-Hyun Jo,et al.  Tail-scope: Using friends to estimate heavy tails of degree distributions in large-scale complex networks , 2014, Scientific Reports.

[53]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[54]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[55]  G. Buzsáki,et al.  The log-dynamic brain: how skewed distributions affect network operations , 2014, Nature Reviews Neuroscience.

[56]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[57]  Zhuhua Cai,et al.  Facilitating real-time graph mining , 2012, CloudDB '12.

[58]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[59]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[60]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[61]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[62]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[63]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[64]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[65]  C. Leiserson,et al.  Scheduling multithreaded computations by work stealing , 1999, Proceedings 35th Annual Symposium on Foundations of Computer Science.