Architectural support for enhanced SMT job scheduling

By converting thread-level parallelism to instruction level parallelism, simultaneous multithreaded (SMT) processors are emerging as effective ways to utilize the resources of modern superscalar architectures. However, the full potential of SMT has not yet been reached as most modern operating systems use existing single-thread or multiprocessor algorithms to schedule threads, neglecting contention for resources between threads. To date, even the best SMT scheduling algorithms simply try to group threads for co-residency based on each thread's expected resource utilization but do not take into account variance in thread behavior. As such, we introduce architectural support that enables new thread scheduling algorithms to group threads for co-residency based on fine-grain memory system activity information. The proposed memory monitoring framework centers on the concept of a cache activity vector, which exposes runtime cache resource information to the operating system to improve job scheduling. Using this scheduling technique, we experimentally evaluate the overall performance improvement of workloads on an SMT machine compared against the most recent Linux job scheduler. This work is first motivated with experiments in a simulated environment, then validated on a hyperthreading-enabled Intel Pentium-4 Xeon microprocessor running a modified version of the latest Linux kernel.

[1]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[2]  Daniel A. Connors,et al.  Implementation of fine-grained cache monitoring for improved SMT scheduling , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[3]  Daniel A. Connors,et al.  Predictable Fine-Grained Cache Behavior for Enhanced Simultaneous Multithreading (SMT) Scheduling , 2004 .

[4]  Sebastien Hily,et al.  Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading , 1997 .

[5]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[6]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[7]  Scott A. Mahlke,et al.  IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors , 1998, 25 Years ISCA: Retrospectives and Reprints.

[8]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[9]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[10]  Josep Torrellas,et al.  A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.

[11]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[12]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[13]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[14]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[15]  Susan J. Eggers,et al.  An analysis of operating system behavior on a simultaneous multithreaded architecture , 2000, ASPLOS IX.

[16]  D. Tullsen,et al.  ILP versus TLP on SMT , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[17]  Paul Chow,et al.  Prescient Instruction Prefetch , 2002 .

[18]  Dean M. Tullsen,et al.  Tuning Compiler Optimizations for Simultaneous Multithreading , 2004, International Journal of Parallel Programming.

[19]  John Paul Shen,et al.  Hardware Support for Prescient Instruction Prefetch , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[20]  Sebastien Hily,et al.  Standard Memory Hierarchy Does Not Fit Simultaneous Multithreading , 1998 .

[21]  Brad Calder,et al.  Time Varying Behavior of Programs , 1999 .

[22]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[23]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[24]  Dean M. Tullsen,et al.  Compiling for instruction cache performance on a multithreaded architecture , 2002, MICRO.

[25]  Dean M. Tullsen,et al.  Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.