Energy-Efficient Data Caching Framework for Spark in Hybrid DRAM/NVM Memory Architectures

In Spark, a typical in-memory big data computing framework, an overwhelming majority of memory is used for caching data. Among those cached data, inactive data and suspension data account for a large portion during the execution. These data remain in memory until they are expelled or accessed again. During the period, DRAM needs to consume a lot of refresh energy to maintain these low profit data. Such a great energy waste can be terminated if we use NVM as alternation. Meanwhile, NVM is smaller cell-sized that it provides more in-memory room for caching data instead of disk access in DRAM setting. However, NVM can not completely take the place of DRAM due to its superiority in terms of access latency and endurance. So, hybrid DRAM/NVM memory architectures turns to be the optimal solution and have a promising prospect to solve the memory capacity and energy consumption dilemmas for in-memory big data computing systems. With this observation, in this paper, we propose a data caching framework for Spark in hybrid DRAM/NVM memory configuration. By identifying the data access behaviors with active factor and active stage distance, cache data with higher local I/O activity is prioritized cached in DRAM, while cache data with lower activity is placed into NVM. The data migration strategy dynamically moves the cold data from DRAM into NVM to save static energy consumption. The result shows that the proposed framework can effectively reduce energy consumption about 73.2% and improve latency performance by up to 20.9%.

[1]  Feng Xiaobing,et al.  Heterogeneous Memory Programming Framework Based on Spark for Big Data Processing , 2018 .

[2]  Hyokyung Bahn,et al.  Characterizing Memory Write References for Efficient Management of Hybrid PCM and DRAM Memory , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[3]  Li Zhang,et al.  SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark , 2015, Conf. Computing Frontiers.

[4]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[5]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[6]  Meng Xiaofeng,et al.  A Survey on PCMBased Big Data Storage and Management , 2015 .

[7]  Matthew Poremba,et al.  NVMain: An Architectural-Level Main Memory Simulator for Emerging Non-volatile Memories , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[8]  Rui Zhang,et al.  A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks , 2018, International Journal of Parallel Programming.

[9]  Li Lin,et al.  An Optimal Page-Level Power Management Strategy in PCM–DRAM Hybrid Memory , 2015, International Journal of Parallel Programming.

[10]  Peiquan Jin,et al.  A Novel Page Replacement Algorithm for the Hybrid Memory Architecture Involving PCM and DRAM , 2014, NPC.

[11]  Rui Zhang,et al.  LCRC: A Dependency-Aware Cache Management Policy for Spark , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[12]  Lu Fang,et al.  Yak: A High-Performance Big-Data-Friendly Garbage Collector , 2016, OSDI.

[13]  Hai Jin,et al.  Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures , 2017, ICS '17.

[14]  Kyu Ho Park,et al.  Efficient page caching algorithm with prediction and migration for a hybrid main memory , 2011, SIAP.

[15]  Peiquan Jin,et al.  APP-LRU: A New Page Replacement Method for PCM/DRAM-Based Hybrid Memory Systems , 2014, NPC.

[16]  熊劲,et al.  NV-Shuffle:基于非易失内存的Shuffle机制 , 2018 .

[17]  Tajana Simunic,et al.  PDRAM: A hybrid PRAM and DRAM main memory system , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[18]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[19]  Yuan Xie,et al.  Modeling, Architecture, and Applications for Emerging Memory Technologies , 2011, IEEE Design & Test of Computers.

[20]  Khaled Ben Letaief,et al.  LRC: Dependency-aware cache management for data analytics clusters , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.