Building and Optimizing MRAM-Based Commodity Memories

Emerging non-volatile memory technologies such as MRAM are promising design solutions for energy-efficient memory architecture, especially for mobile systems. However, building commodity MRAM by reusing DRAM designs is not straightforward. The existing memory interfaces are incompatible with MRAM small page size, and they fail to leverage MRAM unique properties, causing unnecessary performance and energy overhead. In this article, we propose four techniques to enable and optimize an LPDDRx-compatible MRAM solution: ComboAS to solve the pin incompatibility; DynLat to avoid unnecessary access latencies; and EarlyPA and BufW to further improve performance by exploiting the MRAM unique features of non-destructive read and independent write path. Combining all these techniques together, we boost the MRAM performance by 17% and provide a DRAM-compatible MRAM solution consuming 21% less energy.

[1]  Sai Prashanth Muralidhara,et al.  Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Y. J. Lee,et al.  Extended scalability of perpendicular STT-MRAM towards sub-20nm MTJ node , 2011, 2011 International Electron Devices Meeting.

[3]  Naehyuck Chang,et al.  Energy- and endurance-aware design of phase change memory caches , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[4]  Ran Duan,et al.  Exploring memory energy optimizations in smartphones , 2011, 2011 International Green Computing Conference and Workshops.

[5]  Yoshihiro Ueda,et al.  A 64Mb MRAM with clamped-reference and adequate-reference schemes , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[6]  Hideto Hidaka,et al.  The cache DRAM architecture: a DRAM with an on-chip cache memory , 1990, IEEE Micro.

[7]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[8]  Moinuddin K. Qureshi,et al.  A case for Refresh Pausing in DRAM memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[9]  Yuanyuan Zhou,et al.  DMA-aware memory energy management , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[10]  Wei Xu,et al.  Data manipulation techniques to reduce phase change memory write energy , 2009, ISLPED.

[11]  David W. Nellans,et al.  Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.

[12]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[13]  Kang G. Shin,et al.  Improving energy efficiency by making DRAM less randomly accessed , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[14]  José F. Martínez,et al.  Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems , 2013, ISCA.

[15]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[16]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[17]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[18]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[19]  Zhao Zhang,et al.  Cached DRAM for ILP Processor Memory Access Latency Reduction , 2001, IEEE Micro.

[20]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[21]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[22]  Norman P. Jouppi,et al.  Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.

[23]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[24]  Norman P. Jouppi,et al.  Staged Reads: Mitigating the impact of DRAM writes on DRAM reads , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[25]  Christopher Batten,et al.  Re-architecting DRAM memory systems with monolithically integrated silicon photonics , 2010, ISCA.

[26]  Seung-Yun Lee,et al.  A Low Power Phase-Change Random Access Memory using a Data-Comparison Write Scheme , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[27]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[28]  Calvin Lin,et al.  A comprehensive approach to DRAM power management , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[29]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[30]  Wenqing Wu,et al.  Multi retention level STT-RAM cache designs with a dynamic refresh scheme , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[31]  Chris Fallin,et al.  Parallel application memory scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[32]  Jung Ho Ahn,et al.  A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.

[33]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[34]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[35]  J. Slaughter,et al.  A Fully Functional 64 Mb DDR3 ST-MRAM Built on 90 nm CMOS Technology , 2013, IEEE Transactions on Magnetics.

[36]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[37]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[38]  Jung Ho Ahn,et al.  Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs , 2009, IEEE Computer Architecture Letters.

[39]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Jichuan Chang,et al.  BOOM: Enabling mobile memory based low-power server DIMMs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[41]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[42]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[43]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[44]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[45]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[46]  Jing Li,et al.  A case for small row buffers in non-volatile main memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[47]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[48]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[49]  Jung-Bae Lee,et al.  A 1.2V 30nm 1.6Gb/s/pin 4Gb LPDDR3 SDRAM with input skew calibration and enhanced control scheme , 2012, 2012 IEEE International Solid-State Circuits Conference.

[50]  A. Driskill-Smith,et al.  Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application , 2010, 2010 International Electron Devices Meeting.