Scratchpad Memory Architectures and Allocation Algorithms for Hard Real-Time Multicore Processors

Time predictability is crucial in hard real-time and safety-critical systems. Cache memories, while useful for improving the average-case memory performance, are not time predictable, especially when they are shared in multicore processors. To achieve time predictability while minimizing the impact on performance, this paper explores several time-predictable scratch-pad memory (SPM) based architectures for multicore processors. To support these architectures, we propose the dynamic memory objects allocation based partition, the static allocation based partition, and the static allocation based priority L2 SPM strategy to retain the characteristic of time predictability while attempting to maximize the performance and energy efficiency. The SPM based multicore architectural design and the related allocation methods thus form a comprehensive solution to hard real-time multicore based computing. Our experimental results indicate the strengths and weaknesses of each proposed architecture and the allocation method, which offers interesting on-chip memory design options to enable multicore platforms for hard real-time systems.

[1]  Peter Marwedel,et al.  Fast, predictable and low energy memory references through architecture-aware compilation , 2004, ASP-DAC 2004: Asia and South Pacific Design Automation Conference 2004 (IEEE Cat. No.04EX753).

[2]  Vincenzo Catania,et al.  EPIC-Explorer: A Parameterized VLIW-based Platform Framework for Design Space Exploration , 2003, ESTImedia.

[3]  Rajeev Barua,et al.  An optimal memory allocation scheme for scratch-pad-based embedded systems , 2002, TECS.

[4]  Jean-François Deverge,et al.  WCET-Directed Dynamic Scratchpad Memory Allocation of Data , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[5]  Wei Zhang,et al.  Exploiting time predictable two-level scratchpad memory for real-time systems , 2011, SAC '11.

[6]  R. Wilhelm,et al.  Predictability Considerations in the Design of Multi-Core Embedded Systems ∗ , 2010 .

[7]  Sascha Uhrig,et al.  Predictable dynamic instruction scratchpad for simultaneous multithreaded processors , 2008, MEDEA '08.

[8]  Wei Zhang,et al.  Exploiting multi-level scratchpad memories for time-predictable multicore computing , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[9]  Jason Cong,et al.  A reuse-aware prefetching scheme for scratchpad memory , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[11]  Peter Marwedel,et al.  Scratchpad sharing strategies for multiprocess embedded systems: a first approach , 2005, 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005..

[12]  Neil Audsley,et al.  The Scratchpad Memory Management Unit for Microblaze : Implementation , Testing , and Case Study 1 , 2009 .

[13]  René Schott,et al.  A Tabu Search Heuristic for Scratch-Pad Memory Management , 2010, ICSE 2010.

[14]  Wei Zhang,et al.  Stack distance based worst-case instruction cache performance analysis , 2011, SAC '11.

[15]  Sharad Malik,et al.  Performance analysis of embedded software using implicit path enumeration , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16]  David B. Whalley,et al.  Bounding worst-case instruction cache performance , 1994, 1994 Proceedings Real-Time Systems Symposium.

[17]  Peter Marwedel,et al.  Comparison of Cache- and Scratch-Pad based Memory Systems with respect to Performance, Area and Energy Consumption , 2007 .

[18]  K. Ghose,et al.  Analytical energy dissipation models for low power caches , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[19]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[20]  Filip Sebek Determining the Worst-Case Instruction Cache Miss-Ratio , 2002 .

[21]  P. Marwedel,et al.  Influence of Onchip Scratchpad Memories on WCET prediction ∗ , 2004 .

[22]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory optimization for embedded multiprocessors , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23]  Reinhard Wilhelm,et al.  Cache Behavior Prediction by Abstract Interpretation , 1996, Sci. Comput. Program..

[24]  Wei Zhang,et al.  Accurately Estimating Worst-Case Execution Time for Multi-core Processors with Shared Direct-Mapped Instruction Caches , 2009, 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[25]  Peter Marwedel,et al.  Overlay techniques for scratchpad memories in low power embedded processors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Enrico Macii,et al.  Architectural Leakage-Aware Management of Partitioned Scratchpad Memories , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[27]  Ting Chen,et al.  WCET centric data allocation to scratchpad memory , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).

[28]  Xianfeng Li,et al.  Estimating the Worst-Case Energy Consumption of Embedded Software , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[29]  Lin Gao,et al.  Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[30]  Peter Marwedel,et al.  WCET-aware static locking of instruction caches , 2012, CGO '12.

[31]  Heonshik Shin,et al.  Dynamic data scratchpad memory management for a memory subsystem with an MMU , 2007, LCTES '07.

[32]  Yun Liang,et al.  Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores , 2009, RTSS.

[33]  Scott A. Mahlke,et al.  Compiler managed dynamic instruction placement in a low-power code cache , 2005, International Symposium on Code Generation and Optimization.

[34]  Wei Zhang,et al.  Bounding Worst-Case Data Cache Performance by Using Stack Distance , 2009, J. Comput. Sci. Eng..

[35]  Tao Zhang,et al.  Prefetching irregular references for software cache on cell , 2008, CGO '08.

[36]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[37]  Yun Liang,et al.  Timing analysis of concurrent programs running on shared cache multi-cores , 2009, 2009 30th IEEE Real-Time Systems Symposium.

[38]  Rajeev Barua,et al.  Dynamic allocation for scratch-pad memory using compile-time decisions , 2006, TECS.

[39]  Wei Zhang,et al.  WCET Analysis for Multi-Core Processors with Shared L2 Instruction Caches , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.

[40]  Mohamed M. Zahran,et al.  Non-Inclusion Property in Multi-Level Caches Revisited , 2007, Int. J. Comput. Their Appl..

[41]  Björn Lisper,et al.  Data cache locking for higher program predictability , 2003, SIGMETRICS '03.

[42]  Francisco J. Cazorla,et al.  Hardware support for WCET analysis of hard real-time multicore systems , 2009, ISCA '09.

[43]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.