A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Cache sharing in Chip Multiprocessors brings cache contention among corunning processes, which often causes considerable degradation of program performance and system fairness. Recent studies have seen the effectiveness of job co-scheduling in alleviating the contention. But finding optimal schedules is challenging. Previous explorations tackle the problem under highly constrained settings. In this work, we show that relaxing those constraints, particularly the assumptions on job lengths and reschedulings, increases the complexity of the problem significantly. Subsequently, we propose a series of algorithms to compute or approximate the optimal schedules in the more general setting. Specifically, we propose an A*-based approach to accelerating the search for optimal schedules by as much as several orders of magnitude. For large problems, we design and evaluate two approximation algorithms, A*-cluster and local-matching algorithms, to effectively approximate the optimal schedules with good accuracy and high scalability. This study contributes better understanding to the optimal co-scheduling problem, facilitates the evaluation of co-scheduling systems, and offers insights and techniques for the development of practical co-scheduling algorithms.

[1]  Dean M. Tullsen,et al.  Initial observations of the simultaneous multithreading Pentium 4 processor , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[2]  Ian Pratt,et al.  Hyper-Threading Aware Process Scheduling Heuristics , 2005, USENIX Annual Technical Conference, General Track.

[3]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[4]  Peter J. Denning,et al.  Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.

[5]  Srihari Makineni,et al.  Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Chen Ding,et al.  Locality approximation using time , 2007, POPL '07.

[7]  A. Janiszewski,et al.  Architectural support for enhanced SMT job scheduling , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[8]  G. Edward Suh,et al.  Analytical cache models with applications to cache partitioning , 2001, ICS '01.

[9]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[10]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Michael D. Smith,et al.  Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[13]  Xiao Zhang,et al.  Processor Hardware Counter Statistics as a First-Class System Resource , 2007, HotOS.

[14]  Sandhya Dwarkadas,et al.  Compatible phase co-scheduling on a CMP of multi-threaded processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[15]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[16]  Dean M. Tullsen,et al.  Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[19]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[20]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[22]  Jun Nakajima,et al.  Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling , 2002, WIESS'02.

[23]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[24]  Jack Edmonds,et al.  Maximum matching and a polyhedron with 0,1-vertices , 1965 .

[25]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[26]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[27]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).