Buffer-Integrated-Cache: A cost-effective SRAM architecture for handheld and embedded platforms

In an SoC, building local storage in each accelerator is area inefficient due to the low average utilization. In this paper, we present design and implementation of Buffer-integrated-Caching (BiC), which allows many buffers to be instantiated simultaneously in caches. BiC enables cores to view portions of the SRAM as cache while accelerators access other portions of the SRAM as private buffers. We demonstrate the cost-effectiveness of BiC based on a recognition MPSoC that includes two Pentiumℒ cores, an Augmented Reality accelerator and a speech recognition accelerator. With 3% extra area added to the baseline L2 cache, BiC eliminates the need to build 215KB dedicated SRAM for the accelerators, while increasing total cache misses by no more than 0.3%.

[1]  Zhen Fang,et al.  Accelerating mobile augmented reality on a handheld platform , 2009, 2009 IEEE International Conference on Computer Design.

[2]  Yan Solihin,et al.  Architecture Support for Improving Bulk Memory Copying and Initialization Performance , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[3]  Zhen Fang,et al.  Performance characterization and optimization of mobile augmented reality on handheld platforms , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Scott A. Mahlke,et al.  Compiler-managed partitioned data caches for low power , 2007, LCTES '07.

[5]  Yan Solihin,et al.  Architectural framework for supporting operating system survivability , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[6]  Laxmi N. Bhuyan,et al.  Performance Measurement of an Integrated NIC Architecture with 10GbE , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[7]  Michael F. P. O'Boyle,et al.  Instruction Cache Energy Saving Through Compiler Way-Placement , 2008, 2008 Design, Automation and Test in Europe.

[8]  Jaejin Lee,et al.  Design and implementation of software-managed caches for multicores with local memory , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[9]  Zhen Fang,et al.  A low-power accelerator for the SPHINX 3 speech recognition system , 2003, CASES '03.

[10]  Anant Agarwal,et al.  Software-based instruction caching for embedded processors , 2006, ASPLOS XII.

[11]  Srinivas Devadas,et al.  Application-specific memory management for embedded systems using software-controlled caches , 2000, Proceedings 37th Design Automation Conference.

[12]  Brian Rogers,et al.  Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.

[13]  Arun K. Somani,et al.  A reconfigurable multi-function computing cache architecture , 2000, FPGA '00.

[14]  Sanjive Agarwala,et al.  A multi-level memory system architecture for high performance DSP applications , 2000, Proceedings 2000 International Conference on Computer Design.

[15]  James E. Smith,et al.  Virtual private caches , 2007, ISCA '07.

[16]  Paolo Ienne,et al.  Way Stealing: Cache-assisted automatic Instruction Set Extensions , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[17]  Yan Solihin,et al.  QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.

[18]  Ali G. Saidi,et al.  Integrated network interfaces for high-bandwidth TCP/IP , 2006, ASPLOS XII.

[19]  Guru Venkataramani,et al.  Comprehensively and efficiently protecting the heap , 2006, ASPLOS XII.

[20]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[21]  Fang Liu,et al.  Understanding how off-chip memory bandwidth partitioning in Chip Multiprocessors affects system performance , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[22]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.