A machine learning assisted data placement mechanism for hybrid storage systems

Abstract Emerging applications produce massive files that show different properties in file size, lifetime, and read/write frequency. Existing hybrid storage systems place these files onto different storage mediums assuming that the access patterns of files are fixed. However, we find that the access patterns of files are changeable during their lifetime. The key to improve the file access performance is to adaptively place the files on the hybrid storage system using the run-time status and the properties of both files and the storage systems. In this paper, we propose a machine learning assisted data placement mechanism that adaptively places files onto the proper storage medium by predicting access patterns of files. We design a PMFS based tracer to collect file access features for prediction and show how this approach is adaptive to the changeable access pattern. Based on data access prediction results, we present a linear data placement algorithm to optimize the data access performance on the hybrid storage mediums. Extensive experimental results show that the proposed learning algorithm can achieve over 90% accuracy for predicting file access patterns. Meanwhile, this paper can achieve over 17% improvement of system performance for file accesses compared with the state-of-the-art linear-time data placement methods.

[1]  A. L. Narasimha Reddy,et al.  Managing storage space in a flash and disk hybrid storage system , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[2]  Gong Zhang,et al.  Automated lookahead data migration in SSD-enabled multi-tiered storage systems , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  A. L. Narasimha Reddy,et al.  NVMFS: A hybrid file system for improving random write in nand-flash SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Yifeng Zhu,et al.  Hot Random Off-Loading: A Hybrid Storage System with Dynamic Data Migration , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[6]  Scott A. Brandt,et al.  Reducing Hybrid Disk Write Latency with Flash-Backed I/O Requests , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[7]  Chuan Heng Foh,et al.  Optimal disk storage allocation for multi-tier storage system , 2012, 2012 Digest APMRC.

[8]  Anand Sivasubramaniam,et al.  HybridStore: A Cost-Efficient, High-Performance Storage System Combining SSDs and HDDs , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Steven Swanson,et al.  Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks , 2019, FAST.

[11]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[12]  Dan Feng,et al.  A Cost-Efficient NVM-Based Journaling Scheme for File Systems , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[13]  Xiaoyan Liu,et al.  The Deep Learning Compiler: A Comprehensive Survey , 2020, IEEE Transactions on Parallel and Distributed Systems.

[14]  Trevor N. Mudge,et al.  FlashCache: a NAND flash memory file cache for low power web servers , 2006, CASES '06.

[15]  Michael M. Swift,et al.  FlashTier: a lightweight, consistent and durable storage cache , 2012, EuroSys '12.

[16]  Yusik Kim,et al.  ExaPlan: Efficient Queueing-Based Data Placement, Provisioning, and Load Balancing for Large Tiered Storage Systems , 2017, ACM Trans. Storage.

[17]  Yuan Xie CRISP: Center for Research on Intelligent Storage and Processing-in-Memory , 2019, VLSI-DAT.

[18]  Liang Liang,et al.  SSDKeeper: Self-Adapting Channel Allocation to Improve the Performance of SSD Devices , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[19]  Jinxiang Wang,et al.  PRO: A periodical reset optimized page migration scheme for hybrid memory system , 2020, J. Syst. Archit..

[20]  David J. DeWitt,et al.  Turbocharging DBMS buffer pool using SSDs , 2011, SIGMOD '11.

[21]  Edwin Hsing-Mean Sha,et al.  A unified framework for designing high performance in-memory and hybrid memory file systems , 2016, J. Syst. Archit..

[22]  Erez Zadok,et al.  Filebench: A Flexible Framework for File System Benchmarking , 2016, login Usenix Mag..

[23]  Trevor N. Mudge,et al.  Improving NAND Flash Based Disk Caches , 2008, 2008 International Symposium on Computer Architecture.

[24]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .

[25]  Himabindu Pucha,et al.  Cost Effective Storage using Extent Based Dynamic Tiering , 2011, FAST.

[26]  Haitao Zhang,et al.  A Load-Aware Data Migration Scheme for Distributed Surveillance Video Processing with Hybrid Storage Architecture , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[27]  Xiongzi Ge,et al.  MOLAR: A Cost-Efficient, High-Performance SSD-Based Hybrid Storage Cache , 2015, Comput. J..

[28]  Thomas E. Anderson,et al.  Strata: A Cross Media File System , 2017, SOSP.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Chuan Heng Foh,et al.  Optimal Disk Storage Allocation for Multitier Storage System , 2013, IEEE Transactions on Magnetics.

[31]  Peter J. Varman,et al.  Balancing fairness and efficiency in tiered storage systems with bottleneck-aware allocation , 2014, FAST.

[32]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[33]  Yang Li,et al.  HiNextApp: A Context-Aware and Adaptive Framework for App Prediction in Mobile Systems , 2017, 2017 IEEE Trustcom/BigDataSE/ICESS.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Peter J. Varman,et al.  Efficient QoS for Multi-Tiered Storage Systems , 2012, HotStorage.

[36]  Gong Zhang,et al.  Adaptive Data Migration in Multi-tiered Storage Based Cloud Environment , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.