Exploiting Program Semantics to Place Data in Hybrid Memory

Large-memory applications like data analytics and graph processing benefit from extended memory hierarchies, and hybrid DRAM/NVM (non-volatile memory) systems represent an attractive means by which to increase capacity at reasonable performance/energy tradeoffs. Compared to DRAM, NVMs generally have longer latencies and higher energies for writes, which makes careful data placement essential for efficient system operation. Data placement strategies that resort to monitoring all data accesses and migrating objects to dynamically adjust data locations incur high monitoring overhead and unnecessary memory copies due to mispredicted migrations. We find that program semantics (specifically, global access characteristics) can effectively guide initial data placement with respect to memory types, which, in turn, makes run-time migration more efficient. We study a combined offline/online placement scheme that uses access profiling information to place objects statically and then selectively monitors run-time behaviors to optimize placements dynamically. We present a software/hardware cooperative framework, 2PP, and evaluate it with respect to state-of-the-art migratory placement, finding that it improves performance by an average of 12.1%. Furthermore, 2PP improves energy efficiency by up to 51.8%, and by an average of 18.4%. It does so by reducing run-time monitoring and migration overheads.

[1]  Dong Li,et al.  Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[2]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[3]  Yuanyuan Zhou,et al.  The Multi-Queue Replacement Algorithm for Second Level Buffer Caches , 2001, USENIX Annual Technical Conference, General Track.

[4]  Parijat Dube,et al.  Program behavior characterization in large memory systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Yuan Xie,et al.  Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  K QureshiMoinuddin,et al.  Scalable high performance main memory system using phase-change memory technology , 2009 .

[8]  Karthick Rajamani,et al.  Tiered Memory: An Iso-Power Memory Architecture to Address the Memory Power Wall , 2012, IEEE Transactions on Computers.

[9]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[10]  Dong Ye,et al.  Prototyping a hybrid main memory using a virtual machine monitor , 2008, 2008 IEEE International Conference on Computer Design.

[11]  Li Liu,et al.  HMTT: a platform independent full-system memory trace monitoring system , 2008, SIGMETRICS '08.

[12]  Rachata Ausavarungnirun,et al.  Row Buffer Locality-Aware Data Placement in Hybrid Memories , 2011 .

[13]  Berkin Özisikyilmaz,et al.  MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[14]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[15]  Parijat Dube,et al.  Architectural design for next generation heterogeneous memory systems , 2010, 2010 IEEE International Memory Workshop.

[16]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Y.C. Chen,et al.  Write Strategies for 2 and 4-bit Multi-Level Phase-Change Memory , 2007, 2007 IEEE International Electron Devices Meeting.

[18]  Daisuke Takahashi,et al.  The HPC Challenge (HPCC) benchmark suite , 2006, SC.

[19]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[20]  Yukio Hayakawa,et al.  An 8 Mb Multi-Layered Cross-Point ReRAM Macro With 443 MB/s Write Throughput , 2012, IEEE Journal of Solid-State Circuits.

[21]  Michael Still The Definitive Guide to ImageMagick , 2005 .

[22]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[23]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[24]  Shoji Sakamoto,et al.  An 8Mb multi-layered cross-point ReRAM macro with 443MB/s write throughput , 2012, 2012 IEEE International Solid-State Circuits Conference.

[25]  Rami G. Melhem,et al.  Concurrent page migration for mobile systems with OS-managed hybrid memory , 2014, Conf. Computing Frontiers.

[26]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[27]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[28]  Ki-Woong Park,et al.  OPAMP: Evaluation Framework for Optimal Page Allocation of Hybrid Main Memory Architecture , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[29]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[30]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[31]  Kaushik Roy,et al.  Energy-Delay Optimization of the STT MRAM Write Operation Under Process Variations , 2014, IEEE Transactions on Nanotechnology.

[32]  Jongman Kim,et al.  An energy- and performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[33]  Yongbing Huang,et al.  A lightweight hybrid hardware/software approach for object-relative memory profiling , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[34]  Jianhua Li,et al.  STT-RAM based energy-efficiency hybrid cache for CMPs , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[35]  Hyokyung Bahn,et al.  Characterizing Memory Write References for Efficient Management of Hybrid PCM and DRAM Memory , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[36]  K. Mackay,et al.  Extended scalability and functionalities of MRAM based on thermally assisted writing , 2011, 2011 International Electron Devices Meeting.