Virtualizing on-chip distributed ScratchPad memories for low power and trusted application execution

Emerging multicore platforms are increasingly deploying distributed scratchpad memories to achieve lower energy and area together with higher predictability; but this requires transparent and efficient software management of these critical resources. In this paper, we introduce the concept of ScratchPad Memory virtualization, a hardware/software run-time layer (called SPMVisor) that virtualizes the scratchpad memory space in order to facilitate the use of distributed SPMs in an efficient, transparent and secure manner. We introduce the notion of virtual scratchpad memories (vSPMs), which can be dynamically created and managed as regular SPMs. The SPMVisor exploits policy-driven allocation strategies based on application privilege levels and data level prioritization metrics (e.g., utilization) to efficiently manage the on-chip memory real-estate. Our experimental results on Mediabench/CHStone benchmarks running on various Chip-Multiprocessor configurations and software stacks (RTOS, virtualization, secure execution) showed that SPMVisor enhances performance by 71 % on average and reduces power consumption by 79 % on average with respect to traditional context switching schemes. We showed the benefits of using vSPMs in a various environments (a RTOS multi-tasking environment, a virtualization environment, and a trusted execution environment). Furthermore, we explored the effects of mapping instructions and data onto vSPMs, and showed that sharing on-chip space reduces both execution time and energy by an average 16 % and 12 % respectively. We then compared our priority-driven memory allocation scheme with traditional dynamic allocation and showed an average 54 % execution time reduction and 65 % energy savings. Finally, to further validate the SPMVisor’s benefits, we modified the initial bus-based architecture to include a mesh-based CMP with up to 4×4 nodes. We were able to observe that SPMVisor’s priority-driven allocator was able to reduce execution time by an average 17 % with respect to competing allocation policies, while saving an average 65 % across various architectural configurations. We were also able to observe that SPMVisor reduces execution time by an average 12.6 % with respect to competing allocation policies, while saving an average 63.5 % in total energy for various architectural configuration running 1024 jobs.

[1]  Sang Lyul Min,et al.  Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU , 2010, IEEE Transactions on Computers.

[2]  Hiroaki Takada,et al.  Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[3]  Aviral Shrivastava,et al.  Heap data management for limited local memory (LLM) multi-core processors , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4]  Srivaths Ravi,et al.  SECA: security-enhanced communication architecture , 2005, CASES '05.

[5]  Heonshik Shin,et al.  Dynamic scratchpad memory management for code in portable systems with an MMU , 2008, TECS.

[6]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Nikil D. Dutt,et al.  E-RoC: Embedded RAIDs-on-Chip for low power distributed dynamically managed reliable memories , 2011, 2011 Design, Automation & Test in Europe.

[8]  Roy H. Campbell,et al.  Context switch overheads for Linux on ARM platforms , 2007, ExpCS '07.

[9]  Mohamed Shalan,et al.  A dynamic memory management unit for embedded real-time system-on-a-chip , 2000, CASES '00.

[10]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[11]  H. Peter Hofstee,et al.  Cell Broadband Engine processor vault security architecture , 2007, IBM J. Res. Dev..

[12]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[13]  Andrew B. Kahng,et al.  ORION 2.0: A Power-Area Simulator for Interconnection Networks , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  Jun Han,et al.  Unified low cost crypto architecture accelerating RSA/SHA-1 for security processor , 2009, 2009 IEEE 8th International Conference on ASIC.

[15]  Nikil D. Dutt,et al.  Fast exploration of bus-based communication architectures at the CCATB abstraction , 2008, TECS.

[16]  Matthew Mayhew,et al.  Low-power AES coprocessor in 0.18 µm CMOS technology for secure microsystems , 2009 .

[17]  Tulika Mitra,et al.  Scratchpad allocation for concurrent embedded software , 2008, CODES+ISSS '08.

[18]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[19]  Michael K. Reiter,et al.  Flicker: an execution infrastructure for tcb minimization , 2008, Eurosys '08.

[20]  Nikil D. Dutt,et al.  SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[21]  Eric Eide,et al.  IIES 2008 - Proceedings of the 1st Workshop on Isolation and Integration in Embedded Systems: Preface , 2008 .

[22]  Yunheung Paek,et al.  Compiler driven data layout optimization for regular/irregular array access patterns , 2008, LCTES '08.

[23]  Erik Brockmeyer,et al.  Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[24]  Peter Marwedel,et al.  Scratchpad sharing strategies for multiprocess embedded systems: a first approach , 2005, 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005..

[25]  Hiroaki Takada,et al.  Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems , 2010, CASES '10.

[26]  Hiroyuki Tomiyama,et al.  CHStone: A benchmark program suite for practical C-based high-level synthesis , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[27]  Peter Marwedel,et al.  Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications , 2007, SCOPES '07.

[28]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[29]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[30]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[31]  Ruby B. Lee,et al.  New cache designs for thwarting software cache-based side channel attacks , 2007, ISCA '07.

[32]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[33]  Gernot Heiser,et al.  The role of virtualization in embedded systems , 2008, IIES '08.

[34]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[35]  Nikil D. Dutt,et al.  PoliMakE: a policy making engine for secure embedded software execution on chip-multiprocessors , 2010, WESS '10.

[36]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[37]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[38]  Aviral Shrivastava,et al.  Dynamic code mapping for limited local memory systems , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.