Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time Systems

In this paper, we show that cache partitioning does not necessarily ensure predictable cache performance in modern COTS multicore platforms that use non-blocking caches to exploit memory- level-parallelism (MLP). Through carefully designed experiments using three real COTS multicore platforms (four distinct CPU architectures) and a cycle- accurate full system simulator, we show that special hardware registers in non-blocking caches, known as Miss Status Holding Registers (MSHRs), which track the status of outstanding cache-misses, can be a significant source of contention; we observe up to 21X WCET increase in a real COTS multicore platform due to MSHR contention. We propose a hardware and system software (OS) collaborative approach to efficiently eliminate MSHR contention for multicore real-time systems. Our approach includes a low-cost hardware extension that enables dynamic control of per-core MLP by the OS. Using the hardware extension, the OS scheduler then globally controls each core's MLP in such a way that eliminates MSHR contention and maximizes overall throughput of the system. We implement the hardware extension in a cycle- accurate fullsystem simulator and the scheduler modification in Linux 3.14 kernel. We evaluate the effectiveness of our approach using a set of synthetic and macro benchmarks. In a case study, we achieve up to 19% WCET reduction (average: 13%) for a set of EEMBC benchmarks compared to a baseline cache partitioning setup.

[1]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[2]  Andrew F. Glew MLP yes! ILP no , 1998, ASPLOS 1998.

[3]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[4]  Josep Torrellas,et al.  Scalable Cache Miss Handling for High Memory-Level Parallelism , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[5]  Nuno Pereira,et al.  Static-Priority Scheduling over Wireless Networks with Multiple Broadcast Domains , 2007, RTSS 2007.

[6]  Steve Vestal,et al.  Preemptive Scheduling of Multi-criticality Systems with Varying Degrees of Execution Time Assurance , 2007, 28th IEEE International Real-Time Systems Symposium (RTSS 2007).

[7]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Magnus Jahre,et al.  A light-weight fairness mechanism for chip multiprocessor memory systems , 2009, CF '09.

[9]  Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.

[10]  Bradford M. Beckmann,et al.  The gem5 simulator , 2011, CARN.

[11]  Magnus Jahre,et al.  A High Performance Adaptive Miss Handling Architecture for Chip Multiprocessors , 2011, Trans. High Perform. Embed. Archit. Compil..

[12]  Lei Liu,et al.  A software memory partition approach for eliminating bank-level interference in multicore systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Björn Andersson,et al.  Coordinated Bank and Cache Coloring for Temporal Protection of Memory Accesses , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[14]  Marco Caccamo,et al.  Real-time cache management framework for multi-core architectures , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[15]  David Eklov,et al.  Bandwidth Bandit: Quantitative characterization of memory contention , 2012, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[16]  Ragunathan Rajkumar,et al.  A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[17]  James H. Anderson,et al.  Outstanding Paper Award: Making Shared Caches More Predictable on Multicore Platforms , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[18]  Karthikeyan Sankaralingam,et al.  Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[19]  Ronald G. Dreslinski,et al.  Sources of error in full-system simulation , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  Francisco J. Cazorla,et al.  A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study , 2014, 2014 IEEE Real-Time Systems Symposium.

[21]  Rodolfo Pellizzoni,et al.  PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[22]  Ying Ye,et al.  COLORIS: A dynamic cache partitioning system using page coloring , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[23]  Thomas F. Wenisch,et al.  Simulating DRAM controllers for future system architecture exploration , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[24]  Björn Andersson,et al.  Bounding memory interference delay in COTS-based multi-core systems , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[25]  Wang Yi,et al.  Building timing predictable embedded systems , 2014, ACM Trans. Embed. Comput. Syst..

[26]  David Broman,et al.  A predictable and command-level priority-based DRAM controller for mixed-criticality systems , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[27]  Rodolfo Pellizzoni,et al.  Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems , 2014, 2015 27th Euromicro Conference on Real-Time Systems.

[28]  James H. Anderson,et al.  Cache Sharing and Isolation Tradeoffs in Multicore Mixed-Criticality Systems , 2015, 2015 IEEE Real-Time Systems Symposium.

[29]  Robert I. Davis,et al.  Mixed Criticality Systems - A Review , 2015 .

[30]  Heechul Yun,et al.  MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore Based Embedded Systems , 2015, 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications.

[31]  Heechul Yun Evaluating the Isolation Effect of Cache Partitioning on COTS Multicore Platforms , 2015 .

[32]  Frank Mueller,et al.  Providing task isolation via TLB coloring , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.