FlowPaP and FlowReR

Handheld devices, such as smartphones and tablets, currently dominate the semiconductor market. The memory access patterns of CPU and IP cores are dramatically different in a handheld device, making the main memory a critical bottleneck of the entire system. As a result, non-volatile memories, such as spin transfer torque magnetoresistive random-access memory (STT-MRAM), are emerging as a replacement for the existing DRAM-based main memory, achieving a wide variety of advantages. However, replacing DRAM with STT-MRAM also results in new design challenges including read disturbance. A simple read-and-restore scheme preserves data integrity under read disturbance, but incurs significant performance and energy overheads. Consequently, by utilizing unique characteristics of mobile applications, we propose FlowPaP, a flow pattern prediction scheme to dynamically predict the write-to-last-read distances for data frames running on a handheld device. FlowPaP identifies and removes unnecessary memory restores originally required for preventing read disturbance, significantly improving energy efficiency and performance for STT-MRAM-based handheld devices. In addition, we propose a flow-based data retention time reduction scheme named FlowReR to further lower energy consumption of STT-MRAM at the expense of reducing its data retention time. FlowReR imposes a second step that marginally trades off the already improved energy efficiency for performance improvements. Experimental results show that, compared to the original read-and-restore scheme, the application of FlowPaP and FlowReR together can simultaneously improve energy efficiency by 34% and performance by 17% for a set of commonly used Android applications.

[1]  Naehyuck Chang,et al.  Hierarchical memory scheduling for multimedia MPSoCs , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[2]  H. Ohno,et al.  Highly-scalable disruptive reading scheme for Gb-scale SPRAM and beyond , 2010, 2010 IEEE International Memory Workshop.

[3]  Yuan Xie,et al.  Enabling high-performance LPDDRx-compatible MRAM , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[4]  Doug Burger,et al.  On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories , 2004 .

[5]  Mahmut T. Kandemir,et al.  Domain knowledge based energy management in handhelds , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[6]  Yiran Chen,et al.  Design of Last-Level On-Chip Cache Using Spin-Torque Transfer RAM (STT RAM) , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Jason Cong,et al.  Accelerator-rich architectures: Opportunities and progresses , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Chita R. Das,et al.  Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs , 2012, DAC Design Automation Conference 2012.

[9]  Sangyoung Park,et al.  Frame-based and thread-based power management for mobile games on HMP platforms , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[10]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[11]  Mircea R. Stan,et al.  The STeTSiMS STT-RAM simulation and modeling system , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12]  Wenqing Wu,et al.  A dual-mode architecture for fast-switching STT-RAM , 2012, ISLPED '12.

[13]  Lixin Zhang,et al.  Moby: A mobile benchmark suite for architectural simulators , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[14]  Cong Xu,et al.  Impact of Write Pulse and Process Variation on 22 nm FinFET-Based STT-RAM Design: A Device-Architecture Co-Optimization Approach , 2015, IEEE Transactions on Multi-Scale Computing Systems.

[15]  Hongzhong Zheng,et al.  Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling , 2014 .

[16]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[17]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[18]  Petre Stoica,et al.  Spectral Analysis of Signals , 2009 .

[19]  Wenqing Wu,et al.  Multi retention level STT-RAM cache designs with a dynamic refresh scheme , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Yuan Xie,et al.  OAP: An obstruction-aware cache management policy for STT-RAM last-level caches , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Arijit Raychowdhury,et al.  Design space and scalability exploration of 1T-1STT MTJ memory arrays in the presence of variability and disturbances , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[22]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[23]  Jing Li,et al.  A case for small row buffers in non-volatile main memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[24]  Mahmut T. Kandemir,et al.  GemDroid: a framework to evaluate mobile platforms , 2014, SIGMETRICS '14.

[25]  Mahmut T. Kandemir,et al.  Short-Circuiting Memory Traffic in Handheld Platforms , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Arijit Raychowdhury Pulsed READ in spin transfer torque (STT) memory bitcell for lower READ disturb , 2013, 2013 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[27]  William Song,et al.  Negative-resistance read and write schemes for STT-MRAM in 0.13µm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[28]  Chita R. Das,et al.  Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[29]  Sudhakar Yalamanchili,et al.  An energy efficient cache design using Spin Torque Transfer (STT) RAM , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[30]  Yiran Chen,et al.  Performance, Power, and Reliability Tradeoffs of STT-RAM Cell Subject to Architecture-Level Requirement , 2011, IEEE Transactions on Magnetics.

[31]  Danghui Wang,et al.  Improving read performance of STT-MRAM based main memories through Smash Read and Flexible Read , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[32]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[33]  Mahmut T. Kandemir,et al.  VIP: Virtualizing IP chains on handheld platforms , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[34]  Yiming Huai,et al.  Spin-Transfer Torque MRAM (STT-MRAM): Challenges and Prospects , 2008 .

[35]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[36]  Jun Yang,et al.  Selective restore: An energy efficient read disturbance mitigation scheme for future STT-MRAM , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[37]  Yiran Chen,et al.  Coordinating prefetching and STT-RAM based last-level cache management for multicore systems , 2013, GLSVLSI '13.

[38]  Eby G. Friedman,et al.  AC-DIMM: associative computing with STT-MRAM , 2013, ISCA.