Efficient Situational Scheduling of Graph Workloads on Single-Chip Multicores and GPUs

Situational dynamic changes in graph analytic algorithm implementations give rise to efficiency challenges in concurrent hardware, such as GPUs and large-scale multicores. These performance variations stem from input dependence, such as the density and degree of the graph being processed. Consequently, concurrency control becomes challenging, because the complex data-dependent behavior in these workloads exhibits a range of plausible algorithmic and architectural choices. This article addresses the question of how to efficiently harness the multidimensional search space of such choices for graph analytic workloads in a real-time execution environment. A key insight is that architectural choices are sufficient to yield a concurrency control setting that is comparable to the optimal setup that optimizes both algorithmic and architectural choices. The authors propose a situationally adaptive scheduler (SAS) that learns the architectural choices offline using synthetically generated graphs. SAS-assisted execution in a real-time setup provides geometric performance gains of 40 percent for a large-scale GPU (Nvidia GTX-970), 35 percent for a smaller GPU (Nvidia GTX- 750Ti), and 30 percent for a large-scale multicore (Intel Xeon Phi).

[1]  Martin T. Hagan,et al.  Neural network design , 1995 .

[2]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[3]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Omer Khan,et al.  CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores , 2015, 2015 IEEE International Symposium on Workload Characterization.

[5]  Saman P. Amarasinghe,et al.  Portable performance on heterogeneous architectures , 2013, ASPLOS '13.

[6]  Pradeep Dubey,et al.  Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.

[7]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[8]  Nir Shavit,et al.  The big data challenges of connectomics , 2014, Nature Neuroscience.

[9]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[10]  Kevin Skadron,et al.  Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.