Deterministic Memory Abstraction and Supporting Multicore System Architecture

Poor time predictability of multicore processors has been a long-standing challenge in the real-time systems community. In this paper, we make a case that a fundamental problem that prevents efficient and predictable real-time computing on multicore is the lack of a proper memory abstraction to express memory criticality, which cuts across various layers of the system: the application, OS, and hardware. We, therefore, propose a new holistic resource management approach driven by a new memory abstraction, which we call Deterministic Memory. The key characteristic of deterministic memory is that the platform - the OS and hardware - guarantees small and tightly bounded worst-case memory access timing. In contrast, we call the conventional memory abstraction as best-effort memory in which only highly pessimistic worst-case bounds can be achieved. We propose to utilize both abstractions to achieve high time predictability but without significantly sacrificing performance. We present deterministic memory-aware OS and architecture designs, including OS-level page allocator, hardware-level cache, and DRAM controller designs. We implement the proposed OS and architecture extensions on Linux and gem5 simulator. Our evaluation results, using a set of synthetic and real-world benchmarks, demonstrate the feasibility and effectiveness of our approach.

[1]  A. Jaleel Memory Characterization of Workloads Using Instrumentation-Driven Simulation A Pin-based Memory Characterization of the SPEC CPU 2000 and SPEC CPU 2006 Benchmark Suites , 2022 .

[2]  Xiaoning Ding,et al.  SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores , 2011, EuroSys '11.

[3]  Martin Schoeberl,et al.  Towards a Time-predictable Dual-Issue Microprocessor: The Patmos Approach , 2011, PPES.

[4]  Wei Zhang,et al.  Hybrid SPM-cache architectures to achieve high time predictability and performance , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[5]  Henrik Theiling,et al.  Multicore in Real-Time Systems – Temporal Isolation Challenges due to Shared Resources , 2013, DATE 2013.

[6]  Michael Stumm,et al.  Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[7]  Frank Mueller,et al.  Providing task isolation via TLB coloring , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[8]  Martin Schoeberl,et al.  Static analysis of worst-case stack cache behavior , 2013, RTNS '13.

[9]  Christopher D. Gill,et al.  Cache design for mixed criticality real-time systems , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[10]  James H. Anderson,et al.  Attacking the one-out-of-m multicore problem by combining hardware management with mixed-criticality provisioning , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[11]  David Broman,et al.  A PRET microarchitecture implementation with repeatable timing and competitive performance , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[12]  Rolf Ernst,et al.  Improved DRAM Timing Bounds for Real-Time DRAM Controllers with Read/Write Bundling , 2015, 2015 IEEE Real-Time Systems Symposium.

[13]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[14]  Edward A. Lee,et al.  PRET DRAM controller: Bank privatization for predictability and temporal isolation , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[15]  Wei Zhang,et al.  Time-predictable multicore cache architectures , 2011, 2011 3rd International Conference on Computer Research and Development.

[16]  James H. Anderson,et al.  Outstanding Paper Award: Making Shared Caches More Predictable on Multicore Platforms , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[17]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[18]  Thomas F. Wenisch,et al.  Simulating DRAM controllers for future system architecture exploration , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[19]  Robert I. Davis,et al.  Mixed Criticality Systems - A Review , 2015 .

[20]  David Broman,et al.  FlexPRET: A processor platform for mixed-criticality systems , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[21]  Marco Caccamo,et al.  Real-time cache management framework for multi-core architectures , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[22]  Ragunathan Rajkumar,et al.  A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[23]  Francisco J. Cazorla,et al.  parMERASA -- Multi-core Execution of Parallelised Hard Real-Time Applications Supporting Analysability , 2013, 2013 Euromicro Conference on Digital System Design.

[24]  Rodolfo Pellizzoni,et al.  Worst Case Analysis of DRAM Latency in Multi-requestor Systems , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[25]  S. Vestal Preemptive Scheduling of Multi-criticality Systems with Varying Degrees of Execution Time Assurance , 2007, RTSS 2007.

[26]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[27]  Kees G. W. Goossens,et al.  Conservative open-page policy for mixed time-criticality memory controllers , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Rodolfo Pellizzoni,et al.  PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[29]  Andrew Wolfe,et al.  Software-based cache partitioning for real-time applications , 1994 .

[30]  Francisco J. Cazorla,et al.  A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study , 2014, 2014 IEEE Real-Time Systems Symposium.

[31]  David Broman,et al.  A predictable and command-level priority-based DRAM controller for mixed-criticality systems , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[32]  Stephen A. Edwards,et al.  The Case for the Precision Timed (PRET) Machine , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[33]  Kees G. W. Goossens,et al.  Dynamic Command Scheduling for Real-Time Memory Controllers , 2014, 2014 26th Euromicro Conference on Real-Time Systems.

[34]  Francisco J. Cazorla,et al.  An Analyzable Memory Controller for Hard Real-Time CMPs , 2009, IEEE Embedded Systems Letters.

[35]  Benedikt Huber,et al.  T-CREST: Time-predictable multi-core architecture for embedded systems , 2015, J. Syst. Archit..

[36]  Lei Liu,et al.  A software memory partition approach for eliminating bank-level interference in multicore systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[37]  Francisco J. Cazorla,et al.  AHRB: A high-performance time-composable AMBA AHB bus , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[38]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[39]  Robert I. Davis,et al.  Improved cache related pre-emption delay aware response time analysis for fixed priority pre-emptive systems , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[40]  Wei Zhang,et al.  Time-Predictable L2 Cache Design for High-Performance Real-Time Systems , 2010, 2010 IEEE 16th International Conference on Embedded and Real-Time Computing Systems and Applications.

[41]  Shuichi Oikawa,et al.  Resource kernels: a resource-centric approach to real-time and multimedia systems , 2001, Electronic Imaging.

[42]  Marco Caccamo,et al.  A Predictable Execution Model for COTS-Based Embedded Systems , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[43]  Henrik Theiling,et al.  Multi-core Interference-Sensitive WCET Analysis Leveraging Runtime Resource Capacity Enforcement , 2014, 2014 26th Euromicro Conference on Real-Time Systems.

[44]  Alan Burns,et al.  Applying new scheduling theory to static priority pre-emptive scheduling , 1993, Softw. Eng. J..

[45]  Xiao Zhang,et al.  Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.

[46]  Francisco J. Cazorla,et al.  Merasa: Multicore Execution of Hard Real-Time Applications Supporting Analyzability , 2010, IEEE Micro.

[47]  Heechul Yun,et al.  Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[48]  Marco Caccamo,et al.  Memory-centric scheduling for multicore hard real-time systems , 2012, Real-Time Systems.

[49]  Kees G. W. Goossens,et al.  Architecture and analysis of a dynamically-scheduled real-time memory controller , 2016, Real-Time Systems.

[50]  Björn Andersson,et al.  Coordinated Bank and Cache Coloring for Temporal Protection of Memory Accesses , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[51]  Lui Sha,et al.  MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[52]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[53]  Björn Andersson,et al.  Bounding memory interference delay in COTS-based multi-core systems , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[54]  Isabelle Puaut,et al.  PRETI: partitioned real-time shared cache for mixed-criticality real-time systems , 2012, RTNS '12.

[55]  Francisco J. Cazorla,et al.  Hardware support for WCET analysis of hard real-time multicore systems , 2009, ISCA '09.

[56]  Heechul Yun,et al.  MEDUSA: A Predictable and High-Performance DRAM Controller for Multicore Based Embedded Systems , 2015, 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications.

[57]  Petru Eles,et al.  Bus Access Optimization for Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip , 2007, RTSS.

[58]  Jochen Liedtke,et al.  OS-controlled cache predictability for real-time systems , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[59]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[60]  Björn Lisper,et al.  Data cache locking for tight timing calculations , 2007, TECS.

[61]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[62]  Rodolfo Pellizzoni,et al.  A Rank-Switching, Open-Row DRAM Controller for Time-Predictable Systems , 2014, 2014 26th Euromicro Conference on Real-Time Systems.

[63]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[64]  Minming Li,et al.  Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoC , 2010, 2010 39th International Conference on Parallel Processing.