Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory

Scratch pad memories (SPM) are attractive alternatives for caches on multicore systems since caches are relatively expensive in terms of area and energy consumption. The key to effectively utilizing SPMs on multicore systems is the data placement algorithm. In this paper, two polynomial time algorithms, regional data placement for multicore (RDPM) and regional data placement for multicore with duplication (RDPM-DUP), have been proposed to generate near-optimal data placement with minimum total cost. There is only one copy for each data in RDPM, while RDPM-DUP allows data duplication. Experimental results show that the proposed RDPM algorithm alone can reduce the time cost of memory accesses by 32.68% on average compared with existing algorithms. With data duplication, the RDPM-DUP algorithm further reduces the time cost by 40.87%. In terms of energy consumption, the proposed RDPM algorithm with exclusive copy can reduce the total cost by 33.47% on average. When RDPM-DUP is applied, the improvement increases up to 38.15% on average.

[1]  Yi He,et al.  Co-optimization of memory access and task scheduling on MPSoC architectures with multi-level memory , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[2]  Rajeev Barua,et al.  An optimal memory allocation scheme for scratch-pad-based embedded systems , 2002, TECS.

[3]  Yi He,et al.  Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation , 2010, Design Automation Conference.

[4]  Meikang Qiu,et al.  Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems , 2009, TODE.

[5]  Jan-Philipp Weiss,et al.  A survey on hardware‐aware and heterogeneous computing on multicore processors and accelerators , 2009, Concurr. Comput. Pract. Exp..

[6]  Wei-Che Tseng,et al.  Minimizing Access Cost for Multiple Types of Memory Units in Embedded Systems Through Data Allocation and Scheduling , 2012, IEEE Transactions on Signal Processing.

[7]  Mahmut T. Kandemir,et al.  Compiler-guided leakage optimization for banked scratch-pad memories , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[9]  Mahmut T. Kandemir,et al.  Exploiting shared scratch pad memory space in embedded multiprocessor systems , 2002, DAC '02.

[10]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[11]  Nikil D. Dutt,et al.  On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems , 2000, TODE.

[12]  Nam Sung Kim,et al.  Scratchpad memory optimizations for digital signal processing applications , 2011, 2011 Design, Automation & Test in Europe.

[13]  Mahmut T. Kandemir,et al.  Dynamic Scratch-Pad Memory Management for Irregular Array Access Patterns , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[14]  Meikang Qiu,et al.  Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory , 2010, J. Signal Process. Syst..

[15]  Rajeev Barua,et al.  Heap data allocation to scratch-pad memory in embedded systems , 2005, J. Embed. Comput..

[16]  Meikang Qiu,et al.  Efficient Assignment with Guaranteed Probability for Heterogeneous Parallel DSP , 2006 .

[17]  Meikang Qiu,et al.  Optimal Data Allocation for Scratch-Pad Memory on Embedded Multi-core Systems , 2011, 2011 International Conference on Parallel Processing.

[18]  Edwin Hsing-Mean Sha,et al.  Optimal Data Placement for Memory Architectures with Scratch-Pad Memories , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[19]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[20]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[21]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[22]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[23]  N. Okumura,et al.  A 600 MHz single-chip multiprocessor with 4.8 GB/s internal shared pipelined bus and 512 kB internal memory , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[24]  Carl von Platen,et al.  Storage allocation for embedded processors , 2001, CASES '01.

[25]  Edwin Hsing-Mean Sha,et al.  Address assignment sensitive variable partitioning and scheduling for DSPS with multiple memory banks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Coniferous softwood GENERAL TERMS , 2003 .

[27]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[28]  Sumesh Udayakumaran,et al.  Compiler-decided dynamic memory allocation for scratch-pad based embedded systems , 2003, CASES '03.

[29]  Rajeev Barua,et al.  An integrated scratch-pad allocator for affine and non-affine code , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[30]  G. Edward Suh,et al.  Dynamic Cache Partitioning for Simultaneous Multithreading Systems , 2004 .

[31]  Mary Jane Irwin,et al.  Banked scratch-pad memory management for reducing leakage energy consumption , 2004, ICCAD 2004.

[32]  Mahmut T. Kandemir,et al.  Improving scratch-pad memory reliability through compiler-guided data block duplication , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[33]  Wei-Che Tseng,et al.  Minimizing write activities to non-volatile memory via scheduling and recomputation , 2010, 2010 IEEE 8th Symposium on Application Specific Processors (SASP).

[34]  Rajeev Barua,et al.  Dynamic allocation for scratch-pad memory using compile-time decisions , 2006, TECS.

[35]  Mahmut T. Kandemir,et al.  Banked scratch-pad memory management for reducing leakage energy consumption , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[36]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[37]  Erik Brockmeyer,et al.  Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[38]  Karam S. Chatha,et al.  Compilation of stream programs for multicore processors that incorporate scratchpad memories , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).