Optimizing Graph Algorithms in Asymmetric Multicore Processors

Asymmetric multicore processors (AMP) fall under a special subcategory of modern-day heterogeneous multicore architectures with different participating core types executing a common instruction set architecture. The innate asymmetry in the performance of different cores in AMPs poses interesting challenges. Irregular workloads, such as graph algorithms, intensify these challenges as the parallel workloads in these algorithms cannot be precisely characterized at compile time. In this paper, we propose a framework named scheduler for irregular AMPs, which optimizes the efficiency of the given AMP system for a given algorithm-graph pair by optimizing the graph representation and using a predictor to find the optimal configurations to run the algorithm-graph pair. The optimization is performed in two stages: 1) finding an optimal graph representation and 2) finding an optimal hardware configuration to run the input algorithm-graph pair. We have tested the efficiency of our system on five different graph algorithms over eight real-world and synthetic graphs. On an average, we see 42.82% improvement in energy delay product over the base case.

[1]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[3]  Manuel Prieto,et al.  Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems , 2012, TOCS.

[4]  Heeseung Jo,et al.  AMP Aware Core Allocation Scheme for Mobile Devices , 2012, 2012 Spring Congress on Engineering and Technology.

[5]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[6]  Onur Mutlu,et al.  Utility-based acceleration of multithreaded applications on asymmetric CMPs , 2013, ISCA.

[7]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[8]  Brian Jeff Big.LITTLE system architecture from ARM: saving power through heterogeneous multiprocessing and task context migration , 2012, DAC.

[9]  Zheng Wang,et al.  Adaptive optimization for OpenCL programs on embedded heterogeneous systems , 2017, LCTES.

[10]  Hridesh Rajan,et al.  Phase-guided thread-to-core assignment for improved utilization of performance-asymmetric multi-core processors , 2009, 2009 ICSE Workshop on Multicore Software Engineering.

[11]  Vijay Janapa Reddi,et al.  High-performance and energy-efficient mobile web browsing on big/little systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[12]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[13]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[15]  Juan Carlos Saez,et al.  Towards completely fair scheduling on asymmetric single-ISA multicore processors , 2017, J. Parallel Distributed Comput..

[16]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[17]  Shankar Balachandran,et al.  $\mathsf{CHOAMP}$ : Cost Based Hardware Optimization for Asymmetric Multicore Processors , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[18]  Alfons Kemper,et al.  Heterogeneity-conscious parallel query execution: getting a better mileage while driving faster! , 2014, DaMoN '14.

[19]  Alexander Horsch,et al.  REOH: Runtime Energy Optimization for Heterogeneous Systems , 2018, ArXiv.

[20]  Michael J. Quinn,et al.  Parallel graph algorithms , 1984, CSUR.

[21]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[22]  Christopher Batten,et al.  Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..