Hardware-aware Thread Scheduling: The Case of Asymmetric Multicore Processors

Modern processor architectures are increasingly complex and heterogeneous, often requiring solutions tailored to the specific characteristics of each processor model. In this paper we address this problem by targeting the AMD Bulldozer processor as case study for specific hardware-oriented performance optimizations. The Bulldozer architecture features an asymmetric simultaneous multithreading implementation with shared floating point units (FPUs) and per-core arithmetic logic units (ALUs). Bulld Over, presented in this paper, improves thread scheduling by exploiting this hardware characteristic to increase performance of floating point-intensive workloads on Linux-based operating systems. Bulld Over is a user-space monitoring tool that automatically identifies FPU-intensive threads and schedules them in a more efficient way without requiring any patches or modifications at the kernel level. Our measurements using standard benchmark suites show that speedups of up to 10% can be achieved by simply allowing Bulld Over to monitor applications, without any modification of the workload.

[1]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[2]  Wuu Yang,et al.  Migrating Java Threads with Fuzzy Control on Asymmetric Multicore Systems for Better Energy Delay Product , 2011 .

[3]  Jeffrey K. Hollingsworth,et al.  Using Hardware Counters to Automatically Improve Memory Performance , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[4]  Xiao Zhang,et al.  Online cache modeling for commodity multicore processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  ZhuravlevSergey,et al.  Contention-Aware Scheduling on Multicore Systems , 2010 .

[6]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[7]  Jun Nakajima,et al.  Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling , 2002, WIESS'02.

[8]  Simha Sethumadhavan,et al.  Rapid identification of architectural bottlenecks via precise event counting , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[9]  Kirk W. Cameron,et al.  Instruction-level characterization of scientific computing applications using hardware performance counters , 1998, Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization.

[10]  Robert J. Fowler,et al.  SoftPower: fine-grain power estimations using performance counters , 2010, HPDC '10.

[11]  Ramesh Karri,et al.  Are hardware performance counters a cost effective way for integrity checking of programs , 2011, STC '11.

[12]  Zhiqiang Ma,et al.  Demand-driven software race detection using hardware performance counters , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[13]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  David Patterson The trouble with multi-core , 2010, IEEE Spectrum.

[15]  Cesare Pautasso,et al.  Overseer: low-level hardware monitoring and management for Java , 2011, PPPJ.

[16]  Wei Wu,et al.  A systematic method for functional unit power estimation in microprocessors , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[17]  Eduard Ayguadé,et al.  Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.

[18]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.

[19]  Stéphane Eranian What can performance counters do for memory subsystem analysis? , 2008, MSPC '08.

[20]  Kirk W. Cameron,et al.  Critical path-based thread placement for NUMA systems , 2011, PMBS '11.

[21]  Matthias Hauswirth,et al.  Accuracy of performance counter measurements , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[22]  Willy Zwaenepoel,et al.  Performance profiling of virtual machines , 2011, VEE '11.