Hot Data Identification with Multiple Bloom Filters: Block-Level Decision vs I/O Request-Level Decision

Hot data identification is crucial for many applications though few investigations have examined the subject. All existing studies focus almost exclusively on frequency. However, effectively identifying hot data requires equally considering recency and frequency. Moreover, previous studies make hot data decisions at the data block level. Such a fine-grained decision fits particularly well for flash-based storage because its random access achieves performance comparable with its sequential access. However, hard disk drives (HDDs) have a significant performance disparity between sequential and random access. Therefore, unlike flash-based storage, exploiting asymmetric HDD access performance requires making a coarse-grained decision. This paper proposes a novel hot data identification scheme adopting multiple bloom filters to efficiently characterize recency as well as frequency. Consequently, it not only consumes 50% less memory and up to 58% less computational overhead, but also lowers false identification rates up to 65% compared with a state-of-the-art scheme. Moreover, we apply the scheme to a next generation HDD technology, i.e., Shingled Magnetic Recording (SMR), to verify its effectiveness. For this, we design a new hot data identification based SMR drive with a coarse-grained decision. The experiments demonstrate the importance and benefits of accurate hot data identification, thereby improving the proposed SMR drive performance by up to 42%.

[1]  David Hung-Chang Du,et al.  H-SWD: Incorporating Hot Data Identification into Shingled Write Disks , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[2]  Zvonimir Bandic,et al.  Indirection systems for shingled-recording disk drives , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[3]  Gang Wang,et al.  The impact of solid state drive on search engine cache management , 2013, SIGIR.

[4]  Mahesh Balakrishnan,et al.  Extending SSD Lifetimes with Disk-Based Write Caches , 2010, FAST.

[5]  Thomas Albrecht,et al.  Patterned Media: Nanofabrication Challenges of Future Disk Drives , 2008, Proceedings of the IEEE.

[6]  J. Holliday,et al.  Data Management and Layout for Shingled Magnetic Recording , 2011, IEEE Transactions on Magnetics.

[7]  Ahmed Amer,et al.  Classifying data to reduce long term data movement in shingled write disks , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[8]  David Hung-Chang Du,et al.  Large Block CLOCK (LB-CLOCK): A write caching algorithm for solid state disks , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[9]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[10]  Tei-Wei Kuo,et al.  Efficient management for large-scale flash-memory storage systems with resource conservation , 2005, TOS.

[11]  E. C. Gage,et al.  The road to HAMR , 2009, 2009 Asia-Pacific Magnetic Recording Conference.

[12]  A. Kikitsu,et al.  Recent Progress of Patterned Media , 2007, IEEE Transactions on Magnetics.

[13]  Li-Pin Chang,et al.  Hybrid solid-state disks: Combining heterogeneous NAND flash in large SSDs , 2008, 2008 Asia and South Pacific Design Automation Conference.

[14]  Ruei-Chuan Chang,et al.  Managing flash memory in personal communication devices , 1997, ISCE '97. Proceedings of 1997 IEEE International Symposium on Consumer Electronics (Cat. No.97TH8348).

[15]  Hyojun Kim,et al.  BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage , 2008, FAST.

[16]  Garth A. Gibson,et al.  Directions for Shingled-Write and Two-Dimensional Magnetic Recording System Architectures: Synergies with Solid-State Disks (CMU-PDL-09-104) , 2009 .

[17]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[18]  Jianguo Wang,et al.  In-Storage Computing for Hadoop MapReduce Framework: Challenges and Possibilities , 2016 .

[19]  M. Fatih Erden,et al.  Heat Assisted Magnetic Recording , 2008, Proceedings of the IEEE.

[20]  David Hung-Chang Du,et al.  CFTL: a convertible flash translation layer adaptive to data access patterns , 2010, SIGMETRICS '10.

[21]  David Hung-Chang Du,et al.  Hot data identification for flash-based storage systems using multiple bloom filters , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[22]  Guangyu Sun,et al.  A Hybrid solid-state storage architecture for the performance, energy consumption, and lifetime improvement , 2010, HPCA 2010.

[23]  Ziqi Fan,et al.  Evaluating Host Aware SMR Drives , 2016, HotStorage.

[24]  Zhiliang Qin,et al.  Write Failure Analysis for Bit-Patterned-Media Recording and Its Impact on Read Channel Modeling , 2010, IEEE Transactions on Magnetics.

[25]  Yannis Papakonstantinou,et al.  SSD in-storage computing for list intersection , 2016, DaMoN '16.

[26]  Peter Desnoyers,et al.  Write Endurance in Flash Drives: Measurements and Analysis , 2010, FAST.

[27]  Ahmed Amer,et al.  Design issues for a shingled write disk system , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[28]  Tei-Wei Kuo,et al.  An adaptive striping architecture for flash memory storage systems of embedded systems , 2002, Proceedings. Eighth IEEE Real-Time and Embedded Technology and Applications Symposium.

[29]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[30]  S. Greaves,et al.  Shingled Recording for 2–3 Tbit/in $^2$ , 2009 .

[31]  P. Desnoyers,et al.  Skylight—A Window on Shingled Disk Operation , 2015, FAST.

[32]  Garth A. Gibson,et al.  Shingled Magnetic Recording: Areal Density Increase Requires New Data Management , 2013, login Usenix Mag..

[33]  Tei-Wei Kuo,et al.  Efficient identification of hot data for flash memory storage systems , 2006, TOS.

[34]  David Hung-Chang Du,et al.  A Dynamic Switching Flash Translation Layer Based on Page-Level Mapping , 2016, IEICE Trans. Inf. Syst..

[35]  Radu Stoica,et al.  Identifying hot and cold data in main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[36]  David E. Taylor,et al.  Longest prefix matching using bloom filters , 2006, TNET.