Variation-tolerant non-uniform 3D cache management in die stacked multicore processor

Process variations in integrated circuits have significant impact on their performance, leakage and stability. This is particularly evident in large, regular and dense structures such as DRAMs. DRAMs are built using minimized transistors with presumably uniform speed in an organized array structure. Process variation can introduce latency disparity among different memory arrays. With the proliferation of 3D stacking technology, DRAMs become a favorable choice for stacking on top of a multicore processor as a last level cache for large capacity, high bandwidth, and low power. Hence, variations in bank speed creates a unique problem of non-uniform cache accesses in 3D space. In this paper, we investigate cache management techniques for tolerating process variation in a 3D DRAM stacked onto a multicore processor. We modeled the process variation in a 4-layer DRAM memory to characterize the latency variations among different banks. As a result, the notion of fast and slow banks from the core's standpoint is no longer associated with their physical distances with the banks. They are determined by the different bank latencies due to process variation. We develop cache migration schemes that utilizes fast banks while limiting the cost due to migration. Our experiments show that there is a great performance benefit in exploiting fast memory banks through migration. On average, a variation-aware management can improve the performance of a workload over the baseline (where one of the slowest bank speed is assumed for all banks) by 17.8%. We are also only 0.45% away in performance from an ideal memory where no process variation is present.

[1]  Yu Cao,et al.  Mapping statistical process variations toward circuit performance variability: an analytical modeling approach , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[2]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[3]  H LohGabriel 3D-Stacked Memory Architectures for Multi-core Processors , 2008 .

[4]  Rajeev Balasubramonian,et al.  Understanding the Impact of 3D Stacked Layouts on ILP , 2007, J. Instr. Level Parallelism.

[5]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[6]  David Blaauw,et al.  Statistical Analysis and Optimization for VLSI: Timing and Power , 2005, Series on Integrated Circuits and Systems.

[7]  Richard E. Matick,et al.  Logic-based eDRAM: Origins and rationale for use , 2005, IBM J. Res. Dev..

[8]  David M. Brooks,et al.  Mitigating the Impact of Process Variations on Processor Register Files and Execution Units , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[9]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).

[10]  Jung Ho Ahn,et al.  A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.

[11]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[12]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[13]  Jian Xu,et al.  Demystifying 3D ICs: the pros and cons of going vertical , 2005, IEEE Design & Test of Computers.

[14]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[15]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[16]  Sani R. Nassif,et al.  Modeling and forecasting of manufacturing variations , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[17]  Zeshan Chishti,et al.  Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.

[18]  Kaushik Roy,et al.  Process variation tolerant SRAM array for ultra low voltage applications , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[19]  Sang-Hoon Lee,et al.  An efficient statistical analysis methodology and its application to high-density DRAMs , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[20]  Josep Torrellas,et al.  EVAL: Utilizing processors with variation-induced timing errors , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[21]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[22]  Gabriel H. Loh,et al.  Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[23]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[24]  Berkin Özisikyilmaz,et al.  Evaluating the effects of cache redundancy on profit , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[25]  B.C. Paul,et al.  Process variation in embedded memories: failure analysis and variation aware architecture , 2005, IEEE Journal of Solid-State Circuits.

[26]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[27]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[28]  Tao Li,et al.  NBTI tolerant microarchitecture design in the presence of process variation , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[29]  Kaustav Banerjee,et al.  A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[30]  Yu (Kevin) Cao,et al.  What is Predictive Technology Model (PTM)? , 2009, SIGD.

[31]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[32]  Gu-Yeon Wei,et al.  ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency , 2008, 2008 International Symposium on Computer Architecture.

[33]  Dhiraj K. Pradhan,et al.  Failure analysis for ultra low power nano-CMOS SRAM under process variations , 2008, 2008 IEEE International SOC Conference.

[34]  K. Roy,et al.  Design of a Process Variation Tolerant Self-Repairing SRAM for Yield Enhancement in Nanoscaled CMOS , 2007, IEEE Journal of Solid-State Circuits.

[35]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[36]  Diana Marculescu,et al.  Variation-aware dynamic voltage/frequency scaling , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[37]  Hai Zhou,et al.  Yield-Aware Cache Architectures , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[38]  Yu Cao,et al.  New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration , 2006, IEEE Transactions on Electron Devices.

[39]  Martin Burtscher,et al.  Bridging the processor-memory performance gap with 3D IC technology , 2005, IEEE Design & Test of Computers.

[40]  Tao Li,et al.  Soft error vulnerability aware process variation mitigation , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[41]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[42]  Josep Torrellas,et al.  ReCycle:: pipeline adaptation to tolerate process variation , 2007, ISCA '07.

[43]  Kaustav Banerjee,et al.  3-D ICs: a novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration , 2001, Proc. IEEE.

[44]  T. Schloesser,et al.  Challenges for the DRAM cell scaling to 40nm , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[45]  C. C. McAndrew Statistical Circuit Modeling , 1998 .

[46]  Rajeev Balasubramonian,et al.  Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[47]  Sani R. Nassif Modeling and forecasting of manufacturing variations (embedded tutorial) , 2001, ASP-DAC '01.

[48]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[49]  Zeshan Chishti,et al.  Shapeshifter: Dynamically changing pipeline width and speed to address process variations , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[50]  Mahmut T. Kandemir,et al.  Design and Management of 3D Chip Multiprocessors Using Network-in-Memory , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).