Compatible phase co-scheduling on a CMP of multi-threaded processors

The industry is rapidly moving towards the adoption of chip multi-processors (CMPs) of simultaneous multi-threaded (SMT) cores for general purpose systems. The most prominent use of such processors, at least in the near term, is as job servers running multiple independent threads on the different contexts of the various SMT cores. In such an environment, the co-scheduling of phases from different threads plays a significant role in the overall throughput. Less throughput is achieved when phases from different threads that conflict for particular hardware resources are scheduled together, compared with the situation where compatible phases are co-scheduled on the same SMT core. Achieving the latter requires precise per-phase hardware statistics that the scheduler can use to rapidly identify possible incompatibilities among phases of different threads, thereby avoiding the potentially high performance cost of inter-thread contention. In this paper, we devise phase co-scheduling policies for a dual-core CMP of dual-threaded SMT processors. We explore a number of approaches and find that the use of ready and in-flight instruction metrics permits effective co-scheduling of compatible phases among the four contexts. This approach significantly outperforms the worst static grouping of threads, and very closely matches the best static grouping, even outperforming it by as much as 7%

[1]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[2]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[3]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[4]  Margo I. Seltzer,et al.  Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.

[5]  Allan Snavely Explorations in Symbiosis on two Multithreaded Architectures , 1999 .

[6]  A. Janiszewski,et al.  Architectural support for enhanced SMT job scheduling , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[7]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[8]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[9]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[10]  Guang R. Gao,et al.  Multithreaded Execution Architecture and Compilation , 1999, HPCA.

[11]  David H. Albonesi,et al.  Front-end policies for improved issue efficiency in SMT processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[12]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[13]  Steven K. Reinhardt,et al.  The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[14]  David A. Koufaty,et al.  Hyperthreading Technology in the Netburst Microarchitecture , 2003, IEEE Micro.

[15]  Daniel A. Connors,et al.  Implementation of fine-grained cache monitoring for improved SMT scheduling , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[16]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[17]  Rohit Bhatia,et al.  Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.

[18]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[19]  Francisco J. Cazorla,et al.  Dynamically Controlled Resource Allocation in SMT Processors , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).