Segment Streaming for the Three-Phase Execution Model: Design and Implementation

Scheduling tasks using the three-phase execution model (load-execute-unload) can effectively reduce the contention on shared resources in real-time systems. Due to system and program constraints, a task is generally segmented and executed over multiple intervals. Several works showed that co-scheduling memory (unload-load) and computation phases can improve the system schedulability by hiding the memory transfer time. However, this is limited to segments of different tasks and hence executing segments of the same task back-to-back is not allowed. In this paper, we propose a new streaming model to allow overlapping the memory and execution phases of segments of the same task. This is accomplished by a segmentation framework implemented within an LLVM-based compiler-level tool along with a Real-Time Operating System (RTOS) API to handle load/unload requests. Memory phases are processed by a DMA engine that loads/unloads the task content into ScratchPad Memory (SPM). We provide a schedulability analysis of the proposed model under fixed priority partitioned scheme and an RTOS implementation of the API on a latest-generation Multiprocessor System-on-Chip (MPSoC).

[1]  Paolo Valente,et al.  SiGAMMA: server based integrated GPU arbitration mechanism for memory accesses , 2017, RTNS.

[2]  Rodolfo Pellizzoni,et al.  Schedulability analysis of global memory-predictable scheduling , 2014, 2014 International Conference on Embedded Software (EMSOFT).

[3]  Joël Goossens,et al.  Implementation of Memory Centric Scheduling for COTS Multi-Core Real-Time Systems , 2019, ECRTS.

[4]  Thomas Nolte,et al.  Contention-Free Execution of Automotive Applications on a Clustered Many-Core Platform , 2016, 2016 28th Euromicro Conference on Real-Time Systems (ECRTS).

[5]  Martin Schoeberl,et al.  TACLeBench: A Benchmark Collection to Support Worst-Case Execution Time Research , 2016, WCET.

[6]  Rodolfo Pellizzoni,et al.  Hiding memory latency using fixed priority scheduling , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[7]  Mohamed Hassan,et al.  Bounding DRAM Interference in COTS Heterogeneous MPSoCs for Mixed Criticality Systems , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Minyi Guo,et al.  Dynamic Scratch-Pad Memory Management with Data Pipelining for Embedded Systems , 2009, CSE.

[9]  Marco Caccamo,et al.  Designing Mixed Criticality Applications on Modern Heterogeneous MPSoC Platforms , 2019, ECRTS.

[10]  Rodolfo Pellizzoni,et al.  Time-predictable execution of multithreaded applications on multicore systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Luca Benini,et al.  HePREM: Enabling predictable GPU execution on heterogeneous SoC , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[12]  Marco Caccamo,et al.  A Reliable and Predictable Scratchpad-centric OS for Multi-core Embedded Systems , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[13]  Rodolfo Pellizzoni,et al.  PREM-Based Optimal Task Segmentation Under Fixed Priority Scheduling , 2019, ECRTS.

[14]  Steven Derrien,et al.  Hiding Communication Delays in Contention-Free Execution for SPM-Based Multi-Core Architectures , 2019, ECRTS.

[15]  Alan Burns,et al.  A Survey of Research into Mixed Criticality Systems , 2017, ACM Comput. Surv..

[16]  Giorgio C. Buttazzo,et al.  Measuring the Performance of Schedulability Tests , 2005, Real-Time Systems.

[17]  Giorgio C. Buttazzo,et al.  Schedulability analysis of periodic fixed priority systems , 2004, IEEE Transactions on Computers.

[18]  Yosr Slama,et al.  An Overview on Loop Tiling Techniques for Code Generation , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[19]  Giorgio C. Buttazzo,et al.  Bounding the Maximum Length of Non-preemptive Regions under Fixed Priority Scheduling , 2009, 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[20]  Nikil D. Dutt,et al.  CHIPS-AHOy: a predictable holistic cyber-physical hypervisor for MPSoCs , 2018, SAMOS.

[21]  Marco Caccamo,et al.  A Predictable Execution Model for COTS-Based Embedded Systems , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[22]  Björn Andersson,et al.  Bounding memory interference delay in COTS-based multi-core systems , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[23]  Rodolfo Pellizzoni,et al.  A Dynamic Scratchpad Memory Unit for Predictable Real-Time Embedded Systems , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[24]  Paolo Valente,et al.  A memory-centric approach to enable timing-predictability within embedded many-core accelerators , 2015, 2015 CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST).

[25]  B. Flachs,et al.  The microarchitecture of the synergistic processor for a cell processor , 2006, IEEE Journal of Solid-State Circuits.

[26]  Giorgio C. Buttazzo,et al.  Memory-processor co-scheduling in fixed priority systems , 2015, RTNS.

[27]  Paolo Valente,et al.  Deterministic Memory Hierarchy and Virtualization for Modern Multi-Core Embedded Systems , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[28]  Marco Caccamo,et al.  Global Real-Time Memory-Centric Scheduling for Multicore Systems , 2016, IEEE Transactions on Computers.

[29]  Luca Benini,et al.  GPUguard: Towards supporting a predictable execution model for heterogeneous SoC , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[30]  Björn B. Brandenburg,et al.  Cache-Related Preemption and Migration Delays : Empirical Approximation and Impact on Schedulability ∗ , 2010 .

[31]  Muhammad Refaat Sedky Soliman Automated Compilation Framework for Scratchpad-based Real-Time Systems , 2019 .

[32]  Marco Caccamo,et al.  A Real-Time Scratchpad-Centric OS for Multi-Core Embedded Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[33]  Luca Benini,et al.  Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution , 2018, PMAM@PPoPP.

[34]  Giuseppe Lipari,et al.  Architecture for a Portable Open Source Real-Time Kernel Environment , 2000 .

[35]  Saturnino Garcia,et al.  CortexSuite: A synthetic brain benchmark suite , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[36]  Rodolfo Pellizzoni,et al.  WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching , 2017, ECRTS.

[37]  Selma Saidi,et al.  Optimizing explicit data transfers for data parallel applications on the cell architecture , 2012, TACO.

[38]  Daniel Gracia Pérez,et al.  Predictable Flight Management System Implementation on a Multicore Processor , 2014 .

[39]  Wolfgang Mauerer,et al.  Look Mum, no VM Exits! (Almost) , 2017, ArXiv.

[40]  Steven Derrien,et al.  Tightening Contention Delays While Scheduling Parallel Applications on Multi-core Architectures , 2017, ACM Trans. Embed. Comput. Syst..

[41]  Marco Caccamo,et al.  Light-PREM: Automated software refactoring for predictable execution on COTS embedded systems , 2014, 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications.

[42]  Rodolfo Pellizzoni,et al.  Memory efficient global scheduling of real-time tasks , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.