Exploiting static and dynamic locality of timing errors in robust L1 cache design

The Process-Variation (PV) effect is a major reliability concern in semiconductor industry as the technology node continues shrinking. As the crucial component in modern processors, cache is vulnerable to PV-induced timing-errors due to its large scale while low logic path depth. To tolerate this timing-error in cache, asymmetric pipelining has been employed, which has low implementation costs while induces unnecessary latency overhead thus degrades the performance of the whole processor. In this paper, we proposes a novel approach to apply variable latency in L1 cache access thus significantly reduce the performance overhead in tolerating the PV-induced timing-errors. Our results show that the performance loss of our approach on processors with low, medium and high error_rate L1 cache is 0.1%, 1.5% and 3.5%, respectively. While the area and power overhead of our approach is 3.1% and 2.8%.

[1]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[2]  Peter J. Denning The locality principle , 2005, Commun. ACM.

[3]  Kaushik Roy,et al.  Exploring high bandwidth pipelined cache architecture for scaled technology , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[4]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[5]  Norman P. Jouppi,et al.  Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , 2008, IEEE Micro.

[6]  Koushik Chakraborty,et al.  Computation spreading: employing hardware migration to specialize CMP cores on-the-fly , 2006, ASPLOS XII.

[7]  Hai Zhou,et al.  Yield-Aware Cache Architectures , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[8]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[9]  Kaushik Roy,et al.  Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Santosh G. Abraham,et al.  Effective instruction prefetching in chip multiprocessors for modern commercial applications , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11]  Jinjun Xiong,et al.  Robust Extraction of Spatial Correlation , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Sung Woo Chung,et al.  Selective wordline voltage boosting for caches to manage yield under process variations , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[13]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[14]  Peter J. Denning,et al.  The locality principle , 2005, CACM.

[15]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[16]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[17]  Soontae Kim,et al.  AVICA: An access-time variation insensitive L1 cache architecture , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Vicki H. Allan,et al.  Petri net versus module scheduling for software pipelining , 1995, MICRO 1995.

[19]  Eric Rotenberg,et al.  FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[20]  Tom W. Chen,et al.  Post Silicon Power/Performance Optimization in the Presence of Process Variations Using Individual Well-Adaptive Body Biasing , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Ram Huggahalli,et al.  Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[22]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[23]  B.C. Paul,et al.  Process variation in embedded memories: failure analysis and variation aware architecture , 2005, IEEE Journal of Solid-State Circuits.

[24]  Alvin R. Lebeck,et al.  Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[25]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[26]  Mahmut T. Kandemir,et al.  Process-Variation-Aware Adaptive Cache Architecture and Management , 2009, IEEE Transactions on Computers.

[27]  Sanghamitra Roy,et al.  Fort-NoCs: Mitigating the threat of a compromised NoC , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).