Reducing Shared Cache Contention by Scheduling Order Adjustment on Commodity Multi-cores

Due to the limitation of power and processor complexity on traditional single core processors, multi-core processors have become the mainstream. One key feature on commodity multi-cores is that the last level cache (LLC) is usually shared. However, the shared cache contention can affect the performance of applications significantly. Several existing proposals demonstrate that task co-scheduling has the potential to alleviate the contention, but it is challenging to make co-scheduling practical in commodity operating systems. In this paper, we propose two lightweight practical textit{cache-aware} co-scheduling methods, namely static SOA and dynamic SOA, to solve the cache contention problem on commodity multi-cores. The central idea of the two methods is that textit{the cache contention can be reduced by adjusting the scheduling order properly}. These two methods are different from each other mainly in the way of acquiring the process's cache requirement. The static SOA (static scheduling order adjustment) method acquires the cache requirement information statically by offline profiling, while the dynamic SOA (dynamic scheduling order adjustment) captures the cache requirement statistics by using performance counters. Experimental results using multi-programmed NAS workloads suggest that the proposed methods can greatly reduce the effect of cache contention on multi-core systems. Specifically, for the static SOA method, the execution time can be reduced by up to 15.7%, the number of cache misses can be reduced by up to 11.8%, and the performance improvement remains obvious across the cache size and the length of time slice. For the dynamic SOA method, the execution time reduction can achieve up to 7.09%.

[1]  Xipeng Shen,et al.  Exploration of the Influence of Program Inputs on CMP Co-scheduling , 2008, Euro-Par.

[2]  Shigeru Kusakabe,et al.  Impact of priority bonuses of Inter-Core Aggregation Scheduler on a commodity CMP platform , 2009 .

[3]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[4]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[5]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[6]  Xipeng Shen,et al.  Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors , 2010, HiPEAC.

[7]  Hiroaki Kobayashi,et al.  A CACHE-AWARE THREAD SCHEDULING POLICY FOR MULTI-CORE PROCESSORS , 2008 .

[8]  Margo Seltzer,et al.  Operating system scheduling for chip multithreaded processors , 2006 .

[9]  Margo I. Seltzer,et al.  Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.

[10]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11]  Dimitrios S. Nikolopoulos,et al.  Scheduling algorithms for effective thread pairing on hybrid multiprocessors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[12]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.