Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors

This paper explores thread scheduling on an increasingly popular architecture: chip multiprocessors with simultaneous multithreading cores. Conventional multiprocessor scheduling, applied to this architecture, will attempt to balance the thread load across cores. This research demonstrates that such an approach eliminates one of the big advantages of this architecture - the ability to use unbalanced schedules to allocate the right amount of execution resources to each thread. However, accommodating unbalanced schedules creates several difficulties, the biggest being the fact that the search space of all schedules (both balanced and unbalanced) is much greater than that of the balanced schedules alone. This work proposes and evaluates scheduling policies that allow the system to identify and migrate toward good thread schedules, whether the best schedules are balanced or unbalanced

[1]  Yen-Kuang Chen,et al.  The energy efficiency of CMP vs. SMT for multimedia workloads , 2004, ICS '04.

[2]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[3]  William J. Dally,et al.  Migration in Single Chip Multiprocessors , 2002, IEEE Computer Architecture Letters.

[4]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[5]  Dean M. Tullsen,et al.  The Danger of Interval-Based Power Efficiency Metrics: When Worst Is Best , 2005, IEEE Computer Architecture Letters.

[6]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[7]  Dirk Grunwald,et al.  Aide de Camp: Asymmetric Dual Core Design for Power and Energy Reduction ; CU-CS-964-03 , 2003 .

[8]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, MICRO.

[9]  T. N. Vijaykumar,et al.  Heat-and-run: leveraging SMT and CMP to manage power density through the operating system , 2004, ASPLOS XI.

[10]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[11]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[12]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Margo I. Seltzer,et al.  Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.

[14]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[16]  A. Janiszewski,et al.  Architectural support for enhanced SMT job scheduling , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[17]  Balaram Sinharoy,et al.  Design and implementation of the POWER5 microprocessor , 2004, Proceedings. 41st Design Automation Conference, 2004..

[18]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[19]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.