Heap data management for limited local memory (LLM) multi-core processors

This paper presents a scheme to manage heap data in the local memory present in each core of a limited local memory (LLM) multi-core processor. While it is possible to manage heap data semi-automatically using software cache, managing heap data of a core through software cache may require changing the code of the other threads. Cross thread modifications are difficult to code and debug, and only become more difficult as we scale the number of cores. We propose a semi-automatic, and scalable scheme for heap data management that hides this complexity in a library with a much natural programming interface. Furthermore, for embedded applications, where the maximum heap size can be known at compile time, we propose optimizations on the heap management to significantly improve the application performance. Experiments on several benchmarks of MiBench executing on the Sony Playstation 3 show that our scheme is easier to use, and if we know the maximum size of heap data, then our optimizations can improve application performance by an average of 14%.

[1]  Rajeev Barua,et al.  An optimal memory allocation scheme for scratch-pad-based embedded systems , 2002, TECS.

[2]  Lin Gao,et al.  Memory coloring: a compiler approach for scratchpad memory management , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[3]  Aviral Shrivastava,et al.  Vector class on Limited Local Memory (LLM) multi-core processors , 2011, 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES).

[4]  B. Flachs,et al.  The microarchitecture of the synergistic processor for a cell processor , 2006, IEEE Journal of Solid-State Circuits.

[5]  Mahmut T. Kandemir,et al.  Exploiting shared scratch pad memory space in embedded multiprocessor systems , 2002, DAC '02.

[6]  Joseph S. Sventek,et al.  Efficient dynamic heap allocation of scratch-pad memory , 2008, ISMM '08.

[7]  Rajeev Barua,et al.  Dynamic allocation for scratch-pad memory using compile-time decisions , 2006, TECS.

[8]  Peter Marwedel,et al.  Cache-aware scratchpad allocation algorithm , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[9]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[10]  Nikil D. Dutt,et al.  On-chip stack based memory organization for low power embedded architectures , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Aviral Shrivastava,et al.  Dynamic code mapping for limited local memory systems , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[12]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[13]  Aviral Shrivastava,et al.  Stack data management for Limited Local Memory (LLM) multi-core processors , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[14]  Aviral Shrivastava,et al.  SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories , 2008, HiPC'08.

[15]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[16]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[17]  Peter Marwedel,et al.  Overlay techniques for scratchpad memories in low power embedded processors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Luca Benini,et al.  A post-compiler approach to scratchpad mapping of code , 2004, CASES '04.

[19]  Daniel A. Brokenshire,et al.  Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..

[20]  Rajeev Barua,et al.  Memory allocation for embedded systems with a compile-time-unknown scratch-pad size , 2005, CASES '05.

[21]  Aviral Shrivastava,et al.  Operation and data mapping for CGRAs with multi-bank memory , 2010, LCTES '10.

[22]  Heonshik Shin,et al.  Scratchpad memory management for portable systems with a memory management unit , 2006, EMSOFT '06.

[23]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[24]  Soonhoi Ha,et al.  A novel technique to use scratch-pad memory for stack management , 2007 .

[25]  Aviral Shrivastava,et al.  A Software-Only Solution to Use Scratch Pads for Stack Data , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Nikil D. Dutt,et al.  On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems , 2000, TODE.

[27]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[28]  Eduard Ayguadé,et al.  A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor , 2007, LCPC.

[29]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.

[30]  Peter Marwedel,et al.  Scratchpad sharing strategies for multiprocess embedded systems: a first approach , 2005, 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005..

[31]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[32]  Sang Lyul Min,et al.  A dynamic code placement technique for scratchpad memory using postpass optimization , 2006, CASES '06.

[33]  Tiago Rogerio Muck,et al.  Run-time scratch-pad memory management for embedded systems , 2011, IECON 2011 - 37th Annual Conference of the IEEE Industrial Electronics Society.

[34]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[35]  Rajeev Barua,et al.  Heap data allocation to scratch-pad memory in embedded systems , 2005, J. Embed. Comput..

[36]  Michael Gschwind,et al.  Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..

[37]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[38]  Peter Marwedel,et al.  Reducing energy consumption by dynamic copying of instructions onto onchip memory , 2002, 15th International Symposium on System Synthesis, 2002..

[39]  Aviral Shrivastava,et al.  A software solution for dynamic stack management on scratch pad memory , 2009, 2009 Asia and South Pacific Design Automation Conference.

[40]  Sri Parameswaran,et al.  A novel instruction scratchpad memory optimization method based on concomitance metric , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[41]  Hui Feng,et al.  Compiler-directed scratchpad memory management via graph coloring , 2009, TACO.