Survey of scheduling techniques for addressing shared resources in multicore processors

Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern computing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contention that results because CMP cores are not independent processors but rather share common resources among cores such as the last level cache (LLC). Shared resource contention can lead to severe and unpredictable performance impact on the threads running on the CMP. Conversely, CMPs offer tremendous opportunities for mulithreaded applications, which can take advantage of simultaneous thread execution as well as fast inter thread data sharing. Many solutions have been proposed to deal with the negative aspects of CMPs and take advantage of the positive. This survey focuses on the subset of these solutions that exclusively make use of OS thread-level scheduling to achieve their goals. These solutions are particularly attractive as they require no changes to hardware and minimal or no changes to the OS. The OS scheduler has expanded well beyond its original role of time-multiplexing threads on a single core into a complex and effective resource manager. This article surveys a multitude of new and exciting work that explores the diverse new roles the OS scheduler can successfully take on.

[1]  Engin Ipek,et al.  Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[2]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[3]  Xipeng Shen,et al.  A study on optimally co-scheduling jobs of different lengths on chip multiprocessors , 2009, CF '09.

[4]  Michael Stumm,et al.  Enhancing operating system support for multicore processors by using hardware performance monitoring , 2009, OPSR.

[5]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[6]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[7]  Hiroaki Kobayashi,et al.  Modeling of cache access behavior based on Zipf's law , 2008, MEDEA '08.

[8]  Mohammad Banikazemi,et al.  PAM: A novel performance/power aware meta-scheduler for multi-core systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS 2010.

[10]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[11]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[12]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[13]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[14]  Gabriel H. Loh,et al.  Dynamic Classification of Program Memory Behaviors in CMPs , 2008 .

[15]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[16]  Won-Taek Lim,et al.  Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Peter Petrov,et al.  Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems , 2007, CASES '07.

[18]  Michael D. Smith,et al.  Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[19]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[20]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[21]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[22]  Frank Vahid,et al.  A One-Shot Configurable-Cache Tuner for Improved Energy and Performance , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[23]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[24]  Michael Stumm,et al.  Path: page access tracking to improve memory management , 2007, ISMM '07.

[25]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[26]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  Allan Snavely,et al.  Accurate memory signatures and synthetic address traces for HPC applications , 2008, ICS '08.

[28]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[30]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[31]  Srihari Makineni,et al.  Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  Yan Solihin,et al.  An analytical model for cache replacement policy performance , 2006, SIGMETRICS '06/Performance '06.

[33]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[34]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[35]  Chen Ding,et al.  Program locality analysis using reuse distance , 2009, TOPL.

[36]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[37]  Xiao Zhang,et al.  Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.

[38]  Lieven Eeckhout,et al.  Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.

[39]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[40]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[41]  Simon Fraser User-level scheduling on NUMA multicore systems under Linux , 2011 .

[42]  Gurindar S. Sohi,et al.  Serialization sets: a dynamic dependence-based parallel execution model , 2009, PPoPP '09.

[43]  Zhen Yang,et al.  CMP cache performance projection: accessibility vs. capacity , 2007, CARN.

[44]  Per Stenström,et al.  An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[45]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[46]  Robert Tappan Morris,et al.  Locating cache performance bottlenecks using data profiling , 2010, EuroSys '10.

[47]  Frank Vahid,et al.  A table-based method for single-pass cache optimization , 2008, GLSVLSI '08.

[48]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[49]  Mahmut T. Kandemir,et al.  Adaptive set pinning: managing shared caches in chip multiprocessors , 2008, ASPLOS.

[50]  Xiaoning Ding,et al.  MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases , 2009, Proc. VLDB Endow..

[51]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[52]  Andrew Brownsword,et al.  Synchronization via scheduling: techniques for efficiently managing shared state , 2011, PLDI '11.

[53]  Hiroshi Nakamura,et al.  Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS , 2007, CARN.

[54]  Yun Liang,et al.  Static analysis for fast and accurate design space exploration of caches , 2008, CODES+ISSS '08.

[55]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[56]  ZhuravlevSergey,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012 .

[57]  Michael Stumm,et al.  Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[58]  Zeshan Chishti,et al.  Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[59]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[60]  David A. Padua,et al.  Compile-Time Based Performance Prediction , 1999, LCPC.

[61]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[62]  John Turek,et al.  Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.

[63]  Francisco J. Cazorla,et al.  FlexDCP: a QoS framework for CMP architectures , 2009, OPSR.

[64]  Adrian Schüpbach,et al.  Design principles for end-to-end multicore schedulers , 2010 .

[65]  Mahmut T. Kandemir,et al.  Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[66]  Mainak Chaudhuri PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[67]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[68]  Vana Kalogeraki,et al.  FACT: a framework for adaptive contention-aware thread migrations , 2011, CF '11.

[69]  Nectarios Koziris,et al.  Memory bandwidth aware scheduling for SMP cluster nodes , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[70]  James E. Smith,et al.  Virtual private caches , 2007, ISCA '07.

[71]  J. Kubiatowicz,et al.  Resource Management in the Tessellation Manycore OS ∗ , 2010 .

[72]  Michael Stumm,et al.  Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.

[73]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[74]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[75]  Ali Kamali SHARING AWARE SCHEDULING ON MULTICORE SYSTEMS , 2010 .

[76]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[77]  Hiroaki Kobayashi,et al.  A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs , 2007, MEDEA '07.

[78]  Jochen Liedtke,et al.  OS-controlled cache predictability for real-time systems , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[79]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[80]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[81]  Kathleen Knobe,et al.  Ease of use with concurrent collections (CnC) , 2009 .

[82]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[83]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[84]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[85]  Christoforos E. Kozyrakis,et al.  From chaos to QoS: case studies in CMP resource management , 2007, CARN.

[86]  Xipeng Shen,et al.  Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.

[87]  Frank Bellosa,et al.  Resource-conscious scheduling for energy efficiency on multicore processors , 2010, EuroSys '10.

[88]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[89]  Zhao Zhang,et al.  Enabling software management for multicore caches with a lightweight hardware support , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[90]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[91]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[92]  Dimitrios S. Nikolopoulos,et al.  Scheduling algorithms for effective thread pairing on hybrid multiprocessors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[93]  Kevin Klues,et al.  Processes and Resource Management in a Scalable Many-core OS ∗ , 2010 .

[94]  Susan J. Eggers,et al.  Impact of sharing-based thread placement on multithreaded architectures , 1994, ISCA '94.

[95]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[96]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[97]  Yun Liang,et al.  Cache modeling in probabilistic execution time analysis , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[98]  Steven A. Hofmeyr,et al.  Load balancing on speed , 2010, PPoPP '10.

[99]  Mahmut T. Kandemir,et al.  Scheduler-based DRAM energy management , 2002, DAC '02.

[100]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[101]  Haibo Chen,et al.  Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[102]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[103]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[104]  T. N. Vijaykumar,et al.  Optimizing Replication, Communication, and Capacity Allocation in CMPs , 2005, ISCA 2005.

[105]  Lei Wang,et al.  Thread-Associative Memory for Multicore and Multithreaded Computing , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[106]  Zhen Fang,et al.  ACCESS: Smart scheduling for asymmetric cache CMPs , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[107]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[108]  Li Zhao,et al.  CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[109]  Rajeev Balasubramonian,et al.  Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[110]  Mahmut T. Kandemir,et al.  A case for integrated processor-cache partitioning in chip multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.