Circuit design of a dual-versioning L1 data cache for optimistic concurrency

This paper proposes a novel L1 data cache design with dual-versioning SRAM cells (dvSRAM) for chip multi-processors (CMP) that implement optimistic concurrency proposals. In this new cache architecture, each dvSRAM cell has two cells, a main cell and a secondary cell, which keep two versions of the same data. These values can be accessed, modified, moved back and forth between the main and secondary cells within the access time of the cache. We design and simulate a 32-KB dual-versioning L1 data cache with 45-nm CMOS technology at 2GHz processor frequency and 1V supply voltage, which we describe in detail. We also introduce three well-known use cases that make use of optimistic concurrency execution and that can benefit from our proposed design. Moreover, we evaluate one of the use cases to show the impact of the dual-versioning cell in both performance and energy consumption. Our experiments show that large speedups can be achieved with acceptable overall energy dissipation.

[1]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[2]  Bharadwaj Amrutur,et al.  Fast low-power decoders for RAMs , 2001, IEEE J. Solid State Circuits.

[3]  Saibal Mukhopadhyay,et al.  Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits , 2003, Proc. IEEE.

[4]  Pedro López,et al.  An hybrid eDRAM/SRAM macrocell to implement first-level data caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  S.-J. Kim,et al.  Multi-valued static random access memory (SRAM) cell with single-electron and MOSFET hybrid circuit , 2005 .

[6]  Bharadwaj Amrutur Design And Analysis Of Fast Low Power Srams , 1999 .

[7]  Norman P. Jouppi,et al.  Synthesis Lectures on Computer Architecture , 2011 .

[8]  W. Dehaene,et al.  A Low-Power Embedded SRAM for Wireless Applications , 2006, IEEE Journal of Solid-State Circuits.

[9]  Pramod Kolar,et al.  A 1.1 GHz 12 $\mu$A/Mb-Leakage SRAM Design in 65 nm Ultra-Low-Power CMOS Technology With Integrated Leakage Reduction for Mobile Applications , 2008, IEEE Journal of Solid-State Circuits.

[10]  Anantha Chandrakasan,et al.  Scaling of stack effect and its application for leakage reduction , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[11]  Mateo Valero,et al.  Circuit design of a dual-versioning L1 data cache , 2012, Integr..

[12]  Vivek De,et al.  A new technique for standby leakage reduction in high-performance circuits , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[13]  G. Edward Suh,et al.  SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[14]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[15]  James R. Larus,et al.  Transactional Memory, 2nd edition , 2010, Transactional Memory.

[16]  Kanad Ghose,et al.  Early Register Deallocation Mechanisms Using Checkpointed Register Files , 2006, IEEE Transactions on Computers.

[17]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[18]  Josep Torrellas,et al.  A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.

[19]  Mateo Valero,et al.  Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[20]  Juergen Pille,et al.  A 32kB 2R/1W L1 data cache in 45nm SOI technology for the POWER7TM processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[21]  H. W. Kye,et al.  A new multi-valued static random access memory (MVSRAM) with hybrid circuit consisting of single-electron (SE) and MOSFET , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[22]  Håkan Grahn,et al.  Transactional memory , 2010, J. Parallel Distributed Comput..

[23]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[24]  Ali Fazli Yeknami Design and Evaluation of A Low-Voltage, Process-Variation-Tolerant SRAM Cache in 90nm CMOS Technology , 2008 .