论文信息 - Hiding Communication Delays in Contention-Free Execution for SPM-Based Multi-Core Architectures

Hiding Communication Delays in Contention-Free Execution for SPM-Based Multi-Core Architectures

Multi-core systems using ScratchPad Memories (SPMs) are attractive architectures for executing time-critical embedded applications, because they provide both predictability and performance. In this paper, we propose a scheduling technique that jointly selects SPM contents off-line, in such a way that the cost of SPM loading/unloading is hidden. Communications are fragmented to augment hiding possibilities. Experimental results show the effectiveness of the proposed technique on streaming applications and synthetic task-graphs. The overlapping of communications with computations allows the length of generated schedules to be reduced by 4% on average on streaming applications, with a maximum of 16%, and by 8% on average for synthetic task graphs. We further show on a case study that generated schedules can be implemented with low overhead on a predictable multi-core architecture (Kalray MPPA).

[1] Benoît Dupont de Dinechin,et al. Time-critical computing on a single-chip massively parallel processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2] Oded Maler,et al. Many-Core Scheduling of Data Parallel Applications Using SMT Solvers , 2014, 2014 17th Euromicro Conference on Digital System Design.

[3] Umut Durak,et al. WCET-aware parallelization of model-based applications for multi-cores: The ARGO approach , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[4] Wayne H. Wolf,et al. TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[5] Jens Palsberg,et al. A decoupled local memory allocator , 2013, TACO.

[6] Jens Knoop,et al. Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs , 2010, TECS.

[7] Alena Simalatsar,et al. Near-optimal deployment of dataflow applications on many-core platforms with real-time guarantees , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[8] Thomas Nolte,et al. Contention-Free Execution of Automotive Applications on a Clustered Many-Core Platform , 2016, 2016 28th Euromicro Conference on Real-Time Systems (ECRTS).

[9] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.

[10] David Broman,et al. WCET-aware dynamic code management on scratchpads for Software-Managed Multicores , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[11] Thomas Nolte,et al. Scheduling multi-rate real-time applications on clustered many-core architectures with memory constraints , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[12] Soonhoi Ha,et al. Executing synchronous dataflow graphs on a SPM-based multicore architecture , 2012, DAC Design Automation Conference 2012.

[13] Sanguthevar Rajasekaran,et al. Real-Time Scheduling Algorithms for Multiprocessor Systems , 2007 .

[14] Rodolfo Pellizzoni,et al. Hiding memory latency using fixed priority scheduling , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[15] Marco Caccamo,et al. A Real-Time Scratchpad-Centric OS for Multi-Core Embedded Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[16] Scott A. Mahlke,et al. Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[17] Rodolfo Pellizzoni,et al. Time-predictable execution of multithreaded applications on multicore systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18] Isabelle Puaut,et al. STR2RTS: Refactored StreamIT Benchmarks into Statically Analyzable Parallel Benchmarks for WCET Estimation & Real-Time Scheduling , 2017, WCET.

[19] Jean-François Deverge,et al. WCET-Directed Dynamic Scratchpad Memory Allocation of Data , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[20] Marco Caccamo,et al. A Predictable Execution Model for COTS-Based Embedded Systems , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[21] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.

[22] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[23] Daniel Gracia Pérez,et al. A closer look into the AER Model , 2016, 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA).

[24] Peter Marwedel,et al. Evaluation of resource arbitration methods for multi-core real-time systems , 2013, WCET.

[25] Robert F. Dell,et al. Formulating Integer Linear Programs: A Rogues' Gallery , 2007, INFORMS Trans. Educ..

[26] Steven Derrien,et al. Tightening Contention Delays While Scheduling Parallel Applications on Multi-core Architectures , 2017, ACM Trans. Embed. Comput. Syst..

[27] Scott A. Mahlke,et al. Stream Compilation for Real-Time Embedded Multicore Systems , 2009, 2009 International Symposium on Code Generation and Optimization.

[28] Marco Caccamo,et al. Light-PREM: Automated software refactoring for predictable execution on COTS embedded systems , 2014, 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications.

[29] Rodolfo Pellizzoni,et al. WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching , 2017, ECRTS.

[30] Hiroaki Takada,et al. Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[31] Rodolfo Pellizzoni,et al. Memory efficient global scheduling of real-time tasks , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[32] Rodolfo Pellizzoni,et al. A Dynamic Scratchpad Memory Unit for Predictable Real-Time Embedded Systems , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[33] Pierre Michaud. Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[34] Roberto Giorgi,et al. Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[35] Daniel Gracia Pérez,et al. Predictable Flight Management System Implementation on a Multicore Processor , 2014 .

[36] Giorgio C. Buttazzo,et al. Memory Feasibility Analysis of Parallel Tasks Running on Scratchpad-Based Architectures , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[37] Tei-Wei Kuo,et al. Memory Bank Partitioning for Fixed-Priority Tasks in a Multi-core System , 2017, 2017 IEEE Real-Time Systems Symposium (RTSS).