Self-learnable Cluster-based Prefetching Method for DRAM-Flash Hybrid Main Memory Architecture

This article presents a novel prefetching mechanism for memory-intensive workloads used in large-scale data centers. We design a negative-AND-flash/dynamic random-access memory (DRAM) hybrid memory architecture as a cost-effective memory architecture to resolve the scalability and power consumption problems of a DRAM-based model. A smart prefetching mechanism based on a cluster-management scheme to cope with dynamically varying and complex access patterns of any given application is designed for maximizing the performance of the DRAM. In this article, we propose a new concept for page management, called a cluster, which prefetches data in our hybrid memory architecture. The cluster management is based on a self-learning scheme on dynamically changeable access patterns by considering any correlation between missed pages. Experimental results show that the overall performance is significantly improved in relation to hit rate, execution time, and energy consumption. Namely, our proposed model can enhance the hit rate by 15% and reduce the execution time by 1.75 times. In addition, we can save energy consumption by around 48% by cutting the number of flushed pages to about an eighth of that in a conventional system.

[1]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[2]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[3]  Douglas J. Joseph,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[4]  Margaret Martonosi,et al.  TCP: tag correlating prefetchers , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[5]  Suman Nath,et al.  Cheap and Large CAMs for High Performance Data-Intensive Networked Systems , 2010, NSDI.

[6]  Shin-Dug Kim,et al.  Harmonized memory system for object-based cloud storage , 2017, Cluster Computing.

[7]  Li-Pin Chang,et al.  Hybrid solid-state disks: Combining heterogeneous NAND flash in large SSDs , 2008, 2008 Asia and South Pacific Design Automation Conference.

[8]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[9]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[10]  Peter Desnoyers,et al.  Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[11]  Trevor N. Mudge,et al.  Improving NAND Flash Based Disk Caches , 2008, 2008 International Symposium on Computer Architecture.

[12]  M.H. Kryder,et al.  After Hard Drives—What Comes Next? , 2009, IEEE Transactions on Magnetics.

[13]  Joe Arnold,et al.  OpenStack Swift: Using, Administering, and Developing for Swift Object Storage , 2014 .

[14]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Youngjae Kim,et al.  FlashSim: A Simulator for NAND Flash-Based Solid-State Drives , 2009, 2009 First International Conference on Advances in System Simulation.

[16]  Alexander V. Veidenbaum,et al.  Stride-directed prefetching for secondary caches , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[17]  Luis Angel D. Bathen,et al.  AMP: Adaptive Multi-stream Prefetching in a Shared Cache , 2007, FAST.

[18]  Tajana Simunic,et al.  PDRAM: A hybrid PRAM and DRAM main memory system , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[19]  Anand Sivasubramaniam,et al.  Going the distance for TLB prefetching: an application-driven study , 2002, ISCA.

[20]  Seongsoo Hong,et al.  Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems , 2002 .

[21]  Jing Xu,et al.  CloudCache: On-demand Flash Cache Management for Cloud Computing , 2016, FAST.

[22]  Trevor N. Mudge,et al.  FlashCache: a NAND flash memory file cache for low power web servers , 2006, CASES '06.

[23]  Shin-Dug Kim,et al.  Optimized Memory-Disk Integrated System with DRAM and Nonvolatile Memory , 2016, IEEE Transactions on Multi-Scale Computing Systems.

[24]  Sang-Won Lee,et al.  Flash-based Extended Cache for Higher Throughput and Faster Recovery , 2012, Proc. VLDB Endow..

[25]  Chanik Park,et al.  Energy-aware demand paging on NAND flash-based embedded storages , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[26]  Steve Byan,et al.  Mercury: Host-side flash caching for the data center , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[27]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[28]  Jihong Kim,et al.  Personalized optimization for android smartphones , 2014, TECS.

[29]  Kai Shen,et al.  Managing prefetch memory for data-intensive online servers , 2005, FAST'05.

[30]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[31]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[32]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[33]  Nong Xiao,et al.  A hybrid memory built by SSD and DRAM to support in-memory Big Data analytics , 2013, Knowledge and Information Systems.

[34]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[35]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.