An energy-efficient memory hierarchy for multi-issue processors

Embedded processors must rely on the efficient use of instruction-level parallelism to answer the performance and energy needs of modern applications. However, a limiting factor to better use available resources inside the processor concerns memory bandwidth. Adding extra ports to allow for more data accesses drastically increases costs and energy. In this paper, we present a novel memory architecture system for embedded multi-issue processors that can overcome the limited memory bandwidth without adding extra ports to the system. We combine the use of software-managed memories (SMM) with the data cache to provide a system with a higher throughput without increasing the number of ports. Compiler-automated code transformations minimize the effort of programmers to benefit from the proposed architecture. Our experimental results show an average speedup of 1.17x, while consuming 69% less dynamic energy and on average 74.7% lower energy-delay product regarding data memory in comparison to a baseline processor.

[1]  Karam S. Chatha,et al.  Scheduling of stream programs onto SPM enhanced processors with code overlay , 2011, 2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia.

[2]  H. Bajwa,et al.  Low-Power High-Performance and Dynamically Configured Multi-Port Cache Memory Architecture , 2007, 2007 International Conference on Electrical Engineering.

[3]  Luca Benini,et al.  A post-compiler approach to scratchpad mapping of code , 2004, CASES '04.

[4]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[5]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[6]  Geoffrey Brown,et al.  ρ-VEX: A reconfigurable and extensible softcore VLIW processor , 2008, 2008 International Conference on Field-Programmable Technology.

[7]  J. Gregory Steffan,et al.  Efficient multi-ported memories for FPGAs , 2010, FPGA '10.

[8]  Majid Sarrafzadeh,et al.  A memory optimization technique for software-managed scratchpad memory in GPUs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[9]  Peter Marwedel,et al.  Cache-aware scratchpad allocation algorithm , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[10]  Peter Marwedel,et al.  Overlay techniques for scratchpad memories in low power embedded processors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[12]  Gorker Alp Malazgirt,et al.  Application specific multi-port memory customization in FPGAs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[13]  Jung Ho Ahn,et al.  CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[14]  Hans Jurgen Mattausch,et al.  Fast quadratic increase of multiport-storage-cell area with port number , 1999 .

[15]  Gabriel L. Nazar,et al.  Improving performance in VLIW soft-core processors through software-controlled scratchpads , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[16]  Rajeev Barua,et al.  An optimal memory allocation scheme for scratch-pad-based embedded systems , 2002, TECS.