On the maturity of parallel applications for asymmetric multi-core processors

Abstract Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. By maintaining two types of cores (fast and slow) AMCs are able to provide high performance under the facility power budget. This paper performs the first extensive evaluation of how portable are the current HPC applications for such supercomputing systems. Specifically we evaluate several execution models on an ARM big.LITTLE AMC using the PARSEC benchmark suite that includes representative highly parallel applications. We compare schedulers at the user, OS and runtime levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system level as it improves the baseline by 23%, while the heterogeneous-aware OS scheduling solution improves the baseline by 10%.

[1]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[2]  Hong Jiang,et al.  Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[4]  Mateo Valero,et al.  Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[6]  Ananta Tiwari,et al.  Characterization and bottleneck analysis of a 64-bit ARMv8 platform , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]  Eduard Ayguadé,et al.  Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures , 2015, ICS.

[8]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[10]  Wei Huang,et al.  Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[12]  Konrad K. Lai,et al.  The Impact of Performance Asymmetry in Emerging Multicore Architectures , 2005, ISCA 2005.

[13]  Edward A. Lee,et al.  A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  Dimitrios S. Nikolopoulos,et al.  A Unified Scheduler for Recursive and Task Dataflow Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[15]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[16]  Dimitrios S. Nikolopoulos,et al.  Deterministic scale-free pipeline parallelism with hyperqueues , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[17]  Onur Mutlu,et al.  Utility-based acceleration of multithreaded applications on asymmetric CMPs , 2013, ISCA.

[18]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[19]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[20]  Nawwaf N. Kharma,et al.  Efficient compile-time task scheduling for heterogeneous distributed computing systems , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[21]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[22]  Israel Koren,et al.  Scalable Thread Scheduling in Asymmetric Multicores for Power Efficiency , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[23]  Dimitrios S. Nikolopoulos,et al.  2 9 M ay 2 01 7 Dependency-Aware Rollback and Checkpoint-Restart for Distributed Task-Based Runtimes , 2017 .

[24]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[25]  Mahmut T. Kandemir,et al.  Controlled Kernel Launch for Dynamic Parallelism in GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[27]  Kai Li,et al.  PARSEC3.0: A Multicore Benchmark Suite with Network Stacks and SPLASH-2X , 2017, CARN.

[28]  Jason Cong,et al.  Energy-efficient scheduling on heterogeneous multi-core architectures , 2012, ISLPED '12.

[29]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[30]  Padam Kumar,et al.  Economical Duplication Based Task Scheduling for Heterogeneous and Homogeneous Computing Systems , 2009, 2009 IEEE International Advance Computing Conference.

[31]  Eduard Ayguadé,et al.  PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite , 2016, ACM Trans. Archit. Code Optim..

[32]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[33]  Dimitrios S. Nikolopoulos,et al.  Heterogeneous Servers based on Programmable Cores and Dataflow Engines , 2017 .

[34]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..