ShaVe-ICE

Traditional approaches for managing software-programmable memories (SPMs) do not support sharing of distributed on-chip memory resources and, consequently, miss the opportunity to better utilize those memory resources. Managing on-chip memory resources in many-core embedded systems with distributed SPMs requires runtime support to share memory resources between various threads with different memory demands running concurrently. Runtime SPM managers cannot rely on prior knowledge about the dynamically changing mix of threads that will execute and therefore should be designed in a way that enables SPM allocations for any unpredictable mix of threads contending for on-chip memory space. This article proposes ShaVe-ICE, an operating-system-level solution, along with hardware support, to virtualize and ultimately share SPM resources across a many-core embedded system to reduce the average memory latency. We present a number of simple allocation policies to improve performance and energy. Experimental results show that sharing SPMs could reduce the average execution time of the workload up to 19.5% and reduce the dynamic energy consumed in the memory subsystem up to 14%.

[1]  Rajeev Barua,et al.  Heterogeneous memory management for embedded systems , 2001, CASES '01.

[2]  Yunheung Paek,et al.  Compiler driven data layout optimization for regular/irregular array access patterns , 2008, LCTES '08.

[3]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[4]  Rajeev Barua,et al.  Memory allocation for embedded systems with a compile-time-unknown scratch-pad size , 2005, CASES '05.

[5]  Aviral Shrivastava,et al.  Automatic and efficient heap data management for Limited Local Memory multicore architectures , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Salvatore Monteleone,et al.  Noxim: An open, extensible and cycle-accurate network on chip simulator , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[7]  Nikil D. Dutt,et al.  SPMCloud: Towards the Single-Chip Embedded ScratchPad Memory-Based Storage Cloud , 2014, TODE.

[8]  Aviral Shrivastava,et al.  Heap data management for limited local memory (LLM) multi-core processors , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  Eduard Ayguadé,et al.  Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[10]  Luca Benini,et al.  An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs , 2012, IEEE Transactions on Computers.

[11]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[12]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[13]  Rajeev Barua,et al.  Heap data allocation to scratch-pad memory in embedded systems , 2005, J. Embed. Comput..

[14]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[15]  Hiroaki Takada,et al.  Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems , 2010, CASES '10.

[16]  Nikil D. Dutt,et al.  SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[17]  Nikil D. Dutt,et al.  HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories , 2012, DAC Design Automation Conference 2012.

[18]  Aviral Shrivastava,et al.  Automatic management of Software Programmable Memories in Many-core Architectures , 2016, IET Comput. Digit. Tech..

[19]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[20]  Aviral Shrivastava,et al.  A software solution for dynamic stack management on scratch pad memory , 2009, 2009 Asia and South Pacific Design Automation Conference.

[21]  Nikil Dutt,et al.  SAM: Software-Assisted Memory Hierarchy for Scalable Manycore Embedded Systems , 2017, IEEE Embedded Systems Letters.

[22]  Lin Gao,et al.  Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[23]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[24]  Peter Marwedel,et al.  Data partitioning for maximal scratchpad usage , 2003, ASP-DAC '03.

[25]  Carl von Platen,et al.  Storage allocation for embedded processors , 2001, CASES '01.

[26]  Tulika Mitra,et al.  Scratchpad allocation for concurrent embedded software , 2010, TOPL.

[27]  Jason Cong,et al.  A reuse-aware prefetching scheme for scratchpad memory , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  Yunheung Paek,et al.  Adaptive Scratch Pad Memory Management for Dynamic Behavior of Multimedia Applications , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  Aviral Shrivastava,et al.  CMSM: An efficient and effective Code Management for Software Managed Multicores , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[30]  Aviral Shrivastava,et al.  SSDM: Smart Stack Data Management for software managed multicores (SMMs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Norman P. Jouppi,et al.  Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.

[32]  Aviral Shrivastava,et al.  Stack data management for Limited Local Memory (LLM) multi-core processors , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[33]  Weng-Fai Wong,et al.  Dynamic cache contention detection in multi-threaded applications , 2011, VEE '11.

[34]  Sarita V. Adve,et al.  Stash: Have your scratchpad and cache it too , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[35]  Gilles Sassatelli,et al.  Accuracy evaluation of GEM5 simulator system , 2012, 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[36]  Weixing Ji,et al.  A Semi-automatic Scratchpad Memory Management Framework for CMP , 2011, APPT.

[37]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[38]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[39]  Puneet Gupta,et al.  VaMV: Variability-aware Memory Virtualization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[40]  Aviral Shrivastava,et al.  SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories , 2008, HiPC'08.

[41]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.