Co-Scheduling Algorithms for Cache-Partitioned Systems

Cache-partitioned architectures allow subsections of the shared last-level cache (LLC) to be exclusively reserved for some applications. This technique dramatically limits interactions between applications that are concurrently executing on a multicore machine. Consider n applications that execute concurrently, with the objective to minimize the makespan, defined as the maximum completion time of the n applications. Key scheduling questions are: (i) which proportion of cache and (ii) how many processors should be given to each application? Here, we assign rational numbers of processors to each application, since they can be shared across applications through multi-threading. In this paper, we provide answers to (i) and (ii) for perfectly parallel applications. Even though the problem is shown to be NP-complete, we give key elements to determine the subset of applications that should share the LLC (while remaining ones only use their smaller private cache). Building upon these results, we design efficient heuristics for general applications. Extensive simulations demonstrate the usefulness of co-scheduling when our efficient cache partitioning strategies are deployed.

[1]  Christoforos E. Kozyrakis,et al.  Reconciling high server utilization and sub-millisecond quality-of-service , 2014, EuroSys '14.

[2]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[4]  Sai Prashanth Muralidhara,et al.  Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Christoforos E. Kozyrakis,et al.  Improving Resource Efficiency at Scale with Heracles , 2016, ACM Trans. Comput. Syst..

[6]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[7]  Brian Rogers,et al.  Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.

[8]  Lui Sha,et al.  Impact of Cache Partitioning on Multi-tasking Real Time Embedded Systems , 2008, 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[9]  Yan Solihin,et al.  Data sharing in multi-threaded applications and its impact on chip design , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[10]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[11]  David A. Bader,et al.  A Methodology for Co-Location Aware Application Performance Modeling in Multicore Computing , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[12]  Franck Cappello,et al.  Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Wang Yi,et al.  Cache-aware scheduling and analysis for multicores , 2009, EMSOFT '09.

[15]  Kenli Li,et al.  Modelling and Developing Co-scheduling Strategies on Multicore Processors , 2015, 2015 44th International Conference on Parallel Processing.

[16]  Stephen A. Jarvis,et al.  Developing Graph-Based Co-Scheduling Algorithms on Multicore Computers , 2016, IEEE Transactions on Parallel and Distributed Systems.

[17]  Wolfgang E. Nagel,et al.  Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture , 2015, 2015 44th International Conference on Parallel Processing.

[18]  Pavan Balaji,et al.  Toward the efficient use of multiple explicitly managed memory subsystems , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[19]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[20]  Michel Dubois,et al.  Simple Penalty-Sensitive Cache Replacement Policies , 2008, J. Instr. Level Parallelism.

[21]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[22]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[23]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[24]  Hal Finkel,et al.  Large-scale compute-intensive analysis via a combined in-situ and co-scheduling workflow approach , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Jian Pei,et al.  A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Xipeng Shen,et al.  A study on optimally co-scheduling jobs of different lengths on chip multiprocessors , 2009, CF '09.

[28]  Vijayalakshmi Srinivasan,et al.  On the Nature of Cache Miss Behavior: Is It √2? , 2008, J. Instr. Level Parallelism.

[29]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[30]  Jack Dongarra,et al.  Report on the Sunway TaihuLight System , 2016 .