Cache-Hierarchy Contention-Aware Scheduling in CMPs

To improve chip multiprocessor (CMP) performance, recent research has focused on scheduling strategies to mitigate main memory bandwidth contention. Nowadays, commercial CMPs implement multilevel cache hierarchies that are shared by several multithreaded cores. In this microprocessor design, contention points may appear along the whole memory hierarchy. Moreover, this problem is expected to aggravate in future technologies, since the number of cores and hardware threads, and consequently the size of the shared caches increase with each microprocessor generation. This paper characterizes the impact on performance of the different contention points that appear along the memory subsystem. The analysis shows that some benchmarks are more sensitive to contention in higher levels of the memory hierarchy (e.g., shared L2) than to main memory contention. In this paper, we propose two generic scheduling strategies for CMPs. The first strategy takes into account the available bandwidth at each level of the cache hierarchy. The strategy selects the processes to be coscheduled and allocates them to cores to minimize contention effects. The second strategy also considers the performance degradation each process suffers due to contention-aware scheduling. Both proposals have been implemented and evaluated in a commercial single-threaded quad-core processor with a relatively small two-level cache hierarchy. The proposals reach, on average, a performance improvement by 5.38 and 6.64 percent when compared with the Linux scheduler, while this improvement is by 3.61 percent for an state-of-the-art memory contention-aware scheduler under the evaluated mixes.

[1]  Dimitrios S. Nikolopoulos,et al.  Realistic Workload Scheduling Policies for Taming the Memory Bandwidth Bottleneck of SMPs , 2004, HiPC.

[2]  Pen-Chung Yew,et al.  On mitigating memory bandwidth contention through bandwidth-aware scheduling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[4]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[5]  Nectarios Koziris,et al.  Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of SMPs , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[6]  Ken Smits,et al.  Penryn: 45-nm next generation Intel® core™ 2 processor , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[7]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[8]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[9]  Lizy Kurian John,et al.  A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[10]  Alexandra Fedorova,et al.  Managing Contention for Shared Resources on Multicore Processors , 2010 .

[11]  Hiroaki Kobayashi,et al.  A CACHE-AWARE THREAD SCHEDULING POLICY FOR MULTI-CORE PROCESSORS , 2008 .

[12]  S Jarp,et al.  Perfmon2: a leap forward in performance monitoring , 2008 .

[13]  Dimitrios S. Nikolopoulos,et al.  Scheduling algorithms with bus bandwidth considerations for SMPs , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[14]  Pascal Bouvry,et al.  Memory-Aware Green Scheduling on Multi-core Processors , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[15]  Balaram Sinharoy,et al.  POWER5 system microarchitecture , 2005, IBM J. Res. Dev..

[16]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[17]  Balaram Sinharoy,et al.  POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).

[18]  Jeffrey D. Gilbert,et al.  Over one million TPCC with a 45nm 6-core Xeon® CPU , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[19]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).