Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

We describe a new operating system scheduling algorithm that improves performance isolation on chip multiprocessors (CMP). Poor performance isolation occurs when an application's performance is determined by the behaviour of its co-runners, i.e., other applications simultaneously running with it. This performance dependency is caused by unfair, co- runner-dependent cache allocation on CMPs. Poor performance isolation interferes with the operating system 's control over priority enforcement and hinders QoS provisioning. Previous solutions required modifications to the hardware. We present a new software solution. Our cache-fair algorithm ensures that the application runs as quickly as it would under fair cache allocation, regardless of how the cache is actually allocated. If the thread executes fewer instructions per cycle than it would under fair cache allocation, the scheduler increases that thread's CPU time slice. This way, the thread's overall performance does not suffer because it is allowed to use the CPU longer. We describe our implementation of the algorithm in Solaristrade 10, and show that it significantly improves performance isolation for SPEC CPU, SPEC JBB and TPC-C.

[1]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[2]  Donald Yeung,et al.  Transparent threads: resource sharing in SMT processors for high single-thread performance , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[3]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[4]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[5]  Richard E. Matick,et al.  Analytical analysis of finite cache penalty and cycles per instruction of a multiprocessor memory hierarchy using miss rates and queuing theory , 2001, IBM J. Res. Dev..

[6]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[7]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[8]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[9]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[10]  Steven Raasch Applications of Thread Prioritization in SMT Processors , 1999 .

[11]  Santosh G. Abraham,et al.  Chip multithreading: opportunities and challenges , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12]  David A. Padua,et al.  Compile-Time Based Performance Prediction , 1999, LCPC.

[13]  Yan Solihin,et al.  Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[14]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[15]  Francisco J. Cazorla,et al.  Predictable performance in SMT processors , 2004, CF '04.

[16]  Mark S. Squillante,et al.  Analytic performance modeling for a spectrum of multithreaded processor architectures , 1995, MASCOTS '95. Proceedings of the Third International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[17]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[18]  Alexandra Fedorova,et al.  Base Vectors : A Potential Technique for Micro-architectural Classification of Applications , 2007 .

[19]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[20]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[21]  Alan M. Frieze,et al.  On Balls and Bins with Deletions , 1998, RANDOM.

[22]  Rohit Jain,et al.  Soft real-time scheduling on simultaneous multithreaded processors , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[23]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[24]  Abhishek Chandra,et al.  Scheduler-Aware Virtual Memory Management , 2003 .

[25]  Margo Seltzer,et al.  Operating system scheduling for chip multithreaded processors , 2006 .

[26]  David K. Tam,et al.  Managing Shared L2 Caches on Multicore Systems in Software , 2007 .

[27]  Rajeev Balasubramonian,et al.  Dynamically managing the communication-parallelism trade-off in future clustered processors , 2003, ISCA '03.

[28]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[29]  Margo Seltzer,et al.  A Non-Work-Conserving Operating System Scheduler For SMT Processors , 2006 .

[30]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[31]  A. Janiszewski,et al.  Architectural support for enhanced SMT job scheduling , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[32]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.