DESTINY: A Comprehensive Tool with 3D and Multi-Level Cell Memory Modeling Capability

To enable the design of large capacity memory structures, novel memory technologies such as non-volatile memory (NVM) and novel fabrication approaches, e.g., 3D stacking and multi-level cell (MLC) design have been explored. The existing modeling tools, however, cover only few memory technologies, technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 2D/3D memories designed using SRAM, resistive RAM (ReRAM), spin transfer torque RAM (STT-RAM), phase change RAM (PCM) and embedded DRAM (eDRAM) and 2D memories designed using spin orbit torque RAM (SOT-RAM), domain wall memory (DWM) and Flash memory. In addition to single-level cell (SLC) designs for all these memories, DESTINY also supports modeling MLC designs for NVMs. We have extensively validated DESTINY against commercial and research prototypes of these memories. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g. latency, area or energy-delay product) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a given optimization target, etc. We believe that DESTINY will boost studies of next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers.

[1]  Sparsh Mittal A Survey of Soft-Error Mitigation Techniques for Non-Volatile Memories , 2017, Comput..

[2]  Jung Ho Ahn,et al.  CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  J. Vetter,et al.  Exploring Design Space of 3 D NVM and eDRAM Caches Using DESTINY Tool , 2015 .

[4]  Rajesh Kumar,et al.  Haswell: A Family of IA 22 nm Processors , 2015, IEEE Journal of Solid-State Circuits.

[5]  Sparsh Mittal,et al.  A survey of techniques for architecting TLBs , 2017, Concurr. Comput. Pract. Exp..

[6]  Yiran Chen,et al.  Processor caches built using multi-level spin-transfer torque RAM cells , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[7]  H. Ohno,et al.  A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions , 2010, 2010 Symposium on VLSI Technology.

[8]  Sparsh Mittal,et al.  A Survey of Cache Bypassing Techniques , 2016 .

[9]  Xuanyao Fong,et al.  Multilevel Spin-Orbit Torque MRAMs , 2015, IEEE Transactions on Electron Devices.

[10]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[11]  Ching-Fen Wu,et al.  High-performance 3D-SRAM architecture design , 2010, 2010 IEEE Asia Pacific Conference on Circuits and Systems.

[12]  Kyungmin Kim,et al.  A 159mm2 32nm 32Gb MLC NAND-flash memory with 200MB/s asynchronous DDR interface , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[13]  Jung Ho Ahn,et al.  CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[14]  M. Wordeman,et al.  An 800-MHz embedded DRAM with a concurrent refresh mode , 2005, IEEE Journal of Solid-State Circuits.

[15]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[16]  Yan Li,et al.  A 34MB/s-Program-Throughput 16Gb MLC NAND with All-Bitline Architecture in 56nm , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[17]  Erik Nelson,et al.  A 45 nm SOI Embedded DRAM Macro for the POWER™ Processor 32 MByte On-Chip L3 Cache , 2011, IEEE Journal of Solid-State Circuits.

[18]  Yukio Hayakawa,et al.  An 8 Mb Multi-Layered Cross-Point ReRAM Macro With 443 MB/s Write Throughput , 2012, IEEE Journal of Solid-State Circuits.

[19]  Jun Yang,et al.  Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors , 2012, DAC Design Automation Conference 2012.

[20]  Richard E. Matick,et al.  A 500 MHz Random Cycle, 1.5 ns Latency, SOI Embedded DRAM Macro Featuring a Three-Transistor Micro Sense Amplifier , 2008, IEEE Journal of Solid-State Circuits.

[21]  Wenqing Wu,et al.  Array Organization and Data Management Exploration in Racetrack Memory , 2016, IEEE Transactions on Computers.

[22]  Nikil Dutt,et al.  An Enhanced Power Estimation Model for On-Chip Caches , 2004 .

[23]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Narayanan Vijaykrishnan,et al.  Three-dimensional cache design exploration using 3DCacti , 2005, 2005 International Conference on Computer Design.

[25]  Yiran Chen,et al.  State-restrict MLC STT-RAM designs for high-reliable high-performance memory system , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[26]  Sparsh Mittal A survey of techniques for designing and managing CPU register file , 2017, Concurr. Comput. Pract. Exp..

[27]  Dong Li,et al.  A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[28]  Norman P. Jouppi,et al.  Understanding the trade-offs in multi-level cell ReRAM memory design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Christopher Gonzalez,et al.  5.1 POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[30]  Robert S. Patti,et al.  Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs , 2006, Proceedings of the IEEE.

[31]  Balaram Sinharoy,et al.  POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).

[32]  Dae-Seok Byeon,et al.  A Comparison between 63nm 8Gb and 90nm 4Gb Multi-Level Cell NAND Flash Memory for Mass Storage Application , 2005, 2005 IEEE Asian Solid-State Circuits Conference.

[33]  S. Iyer,et al.  An 800MHz embedded DRAM with a concurrent refresh mode , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[34]  Shoji Sakamoto,et al.  An 8Mb multi-layered cross-point ReRAM macro with 443MB/s write throughput , 2012, 2012 IEEE International Solid-State Circuits Conference.

[35]  Jeffrey S. Vetter,et al.  A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems , 2016 .

[36]  Hai Li,et al.  Quantitative modeling of racetrack memory, a tradeoff among area, performance, and power , 2015, The 20th Asia and South Pacific Design Automation Conference.

[37]  Sparsh Mittal A Survey of Architectural Techniques for Managing Process Variation , 2016, ACM Comput. Surv..

[38]  Mehdi Baradaran Tahoori,et al.  Ultra-Fast and High-Reliability SOT-MRAM: From Cache Replacement to Normally-Off Computing , 2016, IEEE Transactions on Multi-Scale Computing Systems.

[39]  Hsien-Hsin S. Lee,et al.  Tri-level-cell phase change memory: toward an efficient and reliable memory system , 2013, ISCA.

[40]  Jeffrey S. Vetter,et al.  AYUSH: A Technique for Extending Lifetime of SRAM-NVM Hybrid Caches , 2015, IEEE Computer Architecture Letters.

[41]  Jeffrey S. Vetter,et al.  A Technique for Improving Lifetime of Non-Volatile Caches Using Write-Minimization , 2016 .

[42]  Sparsh Mittal,et al.  A survey of power management techniques for phase change memory , 2016, Int. J. Comput. Aided Eng. Technol..

[43]  Sparsh Mittal,et al.  Building a Fast and Power Efficient Inductive Charge Pump System for 3D Stacked Phase Change Memories , 2017, ACM Great Lakes Symposium on VLSI.

[44]  J.B. Kuang,et al.  A one MB cache subsystem prototype with 2GHz embedded DRAMs in 45nm SOI CMOS , 2008, 2008 IEEE Symposium on VLSI Circuits.

[45]  Sparsh Mittal,et al.  A Survey of Techniques for Architecting Processor Components Using Domain-Wall Memory , 2016, ACM J. Emerg. Technol. Comput. Syst..

[46]  Dong Li,et al.  DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[47]  Bryan Black,et al.  3D processing technology and its impact on iA32 microprocessors , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[48]  Ing-Chao Lin,et al.  High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policy , 2013, GLSVLSI '13.

[49]  Jeffrey S. Vetter,et al.  Reliability Tradeoffs in Design of Volatile and Nonvolatile Caches , 2016, J. Circuits Syst. Comput..

[50]  C. Kothandaraman,et al.  3D stackable 32nm High-K/Metal Gate SOI embedded DRAM prototype , 2011, 2011 Symposium on VLSI Circuits - Digest of Technical Papers.

[51]  Sparsh Mittal,et al.  A Survey of Recent Prefetching Techniques for Processor Caches , 2016, ACM Comput. Surv..

[52]  Sparsh Mittal,et al.  A survey of architectural techniques for improving cache power efficiency , 2014, Sustain. Comput. Informatics Syst..

[53]  Jeffrey S. Vetter,et al.  Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing , 2015, Computing in Science & Engineering.

[54]  Jingtong Hu,et al.  State Asymmetry Driven State Remapping in Phase Change Memory , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[55]  Heng-Yuan Lee,et al.  A 4Mb embedded SLC resistive-RAM macro with 7.2ns read-write random-access time and 160ns MLC-access capability , 2011, 2011 IEEE International Solid-State Circuits Conference.

[56]  Jeffrey S. Vetter,et al.  A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[57]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[58]  Gabriel H. Loh,et al.  3D-Integrated SRAM Components for High-Performance Microprocessors , 2009, IEEE Transactions on Computers.

[59]  Soontae Kim,et al.  Ternary cache: Three-valued MLC STT-RAM caches , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[60]  Sparsh Mittal,et al.  A Survey of Techniques for Architecting and Managing GPU Register File , 2017, IEEE Transactions on Parallel and Distributed Systems.

[61]  Victor V. Zyuban,et al.  IBM POWER7+ design for higher frequency at fixed power , 2013, IBM J. Res. Dev..

[62]  William J. Bowhill,et al.  The Xeon® Processor E5-2600 v3: a 22 nm 18-Core Product Family , 2016, IEEE Journal of Solid-State Circuits.

[63]  Sparsh Mittal,et al.  A Survey of Architectural Techniques for Near-Threshold Computing , 2015, ACM J. Emerg. Technol. Comput. Syst..