Hiding Communication Delays in Contention-Free Execution for SPM-Based Multi-Core Architectures

Multi-core systems using ScratchPad Memories (SPMs) are attractive architectures for executing time-critical embedded applications, because they provide both predictability and performance. In this paper, we propose a scheduling technique that jointly selects SPM contents off-line, in such a way that the cost of SPM loading/unloading is hidden. Communications are fragmented to augment hiding possibilities. Experimental results show the effectiveness of the proposed technique on streaming applications and synthetic task-graphs. The overlapping of communications with computations allows the length of generated schedules to be reduced by 4% on average on streaming applications, with a maximum of 16%, and by 8% on average for synthetic task graphs. We further show on a case study that generated schedules can be implemented with low overhead on a predictable multi-core architecture (Kalray MPPA).

[1]  Benoît Dupont de Dinechin,et al.  Time-critical computing on a single-chip massively parallel processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Oded Maler,et al.  Many-Core Scheduling of Data Parallel Applications Using SMT Solvers , 2014, 2014 17th Euromicro Conference on Digital System Design.

[3]  Umut Durak,et al.  WCET-aware parallelization of model-based applications for multi-cores: The ARGO approach , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[4]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[5]  Jens Palsberg,et al.  A decoupled local memory allocator , 2013, TACO.

[6]  Jens Knoop,et al.  Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs , 2010, TECS.

[7]  Alena Simalatsar,et al.  Near-optimal deployment of dataflow applications on many-core platforms with real-time guarantees , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[8]  Thomas Nolte,et al.  Contention-Free Execution of Automotive Applications on a Clustered Many-Core Platform , 2016, 2016 28th Euromicro Conference on Real-Time Systems (ECRTS).

[9]  Henry M. Levy,et al.  An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.

[10]  David Broman,et al.  WCET-aware dynamic code management on scratchpads for Software-Managed Multicores , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[11]  Thomas Nolte,et al.  Scheduling multi-rate real-time applications on clustered many-core architectures with memory constraints , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Soonhoi Ha,et al.  Executing synchronous dataflow graphs on a SPM-based multicore architecture , 2012, DAC Design Automation Conference 2012.

[13]  Sanguthevar Rajasekaran,et al.  Real-Time Scheduling Algorithms for Multiprocessor Systems , 2007 .

[14]  Rodolfo Pellizzoni,et al.  Hiding memory latency using fixed priority scheduling , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[15]  Marco Caccamo,et al.  A Real-Time Scratchpad-Centric OS for Multi-Core Embedded Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[16]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[17]  Rodolfo Pellizzoni,et al.  Time-predictable execution of multithreaded applications on multicore systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Isabelle Puaut,et al.  STR2RTS: Refactored StreamIT Benchmarks into Statically Analyzable Parallel Benchmarks for WCET Estimation & Real-Time Scheduling , 2017, WCET.

[19]  Jean-François Deverge,et al.  WCET-Directed Dynamic Scratchpad Memory Allocation of Data , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[20]  Marco Caccamo,et al.  A Predictable Execution Model for COTS-Based Embedded Systems , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[21]  Dean M. Tullsen,et al.  Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.

[22]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[23]  Daniel Gracia Pérez,et al.  A closer look into the AER Model , 2016, 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA).

[24]  Peter Marwedel,et al.  Evaluation of resource arbitration methods for multi-core real-time systems , 2013, WCET.

[25]  Robert F. Dell,et al.  Formulating Integer Linear Programs: A Rogues' Gallery , 2007, INFORMS Trans. Educ..

[26]  Steven Derrien,et al.  Tightening Contention Delays While Scheduling Parallel Applications on Multi-core Architectures , 2017, ACM Trans. Embed. Comput. Syst..

[27]  Scott A. Mahlke,et al.  Stream Compilation for Real-Time Embedded Multicore Systems , 2009, 2009 International Symposium on Code Generation and Optimization.

[28]  Marco Caccamo,et al.  Light-PREM: Automated software refactoring for predictable execution on COTS embedded systems , 2014, 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications.

[29]  Rodolfo Pellizzoni,et al.  WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching , 2017, ECRTS.

[30]  Hiroaki Takada,et al.  Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[31]  Rodolfo Pellizzoni,et al.  Memory efficient global scheduling of real-time tasks , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[32]  Rodolfo Pellizzoni,et al.  A Dynamic Scratchpad Memory Unit for Predictable Real-Time Embedded Systems , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[33]  Pierre Michaud Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[34]  Roberto Giorgi,et al.  Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[35]  Daniel Gracia Pérez,et al.  Predictable Flight Management System Implementation on a Multicore Processor , 2014 .

[36]  Giorgio C. Buttazzo,et al.  Memory Feasibility Analysis of Parallel Tasks Running on Scratchpad-Based Architectures , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[37]  Tei-Wei Kuo,et al.  Memory Bank Partitioning for Fixed-Priority Tasks in a Multi-core System , 2017, 2017 IEEE Real-Time Systems Symposium (RTSS).