Predicting file lifetimes for data placement in multi-tiered storage systems for HPC

The emergence of Exascale machines in HPC will have the foreseen consequence of putting more pressure on the storage systems in place, not only in terms of capacity but also bandwidth and latency. With limited budget we cannot imagine using only storage class memory, which leads to the use of a heterogeneous tiered storage hierarchy. In order to make the most efficient use of the high performance tier in this storage hierarchy, we need to be able to place user data on the right tier and at the right time. In this paper, we assume a 2-tier storage hierarchy with a high performance tier and a high capacity archival tier. Files are placed on the high performance tier at creation time and moved to capacity tier once their lifetime expires (that is once they are no more accessed). The main contribution of this paper lies in the design of a file lifetime prediction model solely based on its path based on the use of Convolutional Neural Network. Results show that our solution strikes a good trade-off between accuracy and under-estimation. Compared to previous work, our model made it possible to reach an accuracy close to previous work (around 98.60% compared to 98.84%) while reducing the underestimations by almost 10x to reach 2.21% (compared to 21.86%). The reduction in underestimations is crucial as it avoids misplacing files in the capacity tier while they are still in use.

[1]  Mohamad Chaarawi,et al.  DAOS: A Scale-Out High Performance Storage Stack for Storage Class Memory , 2020, SCFA.

[2]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[3]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[4]  Thomas W. Tucker,et al.  The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Ryousei Takano,et al.  A Prompt Report on the Performance of Intel Optane DC Persistent Memory Module , 2020, IEICE Trans. Inf. Syst..

[6]  Scott Klasky,et al.  Exascale Storage Systems the SIRIUS Way , 2016 .

[7]  Shadi Ibrahim,et al.  Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[8]  Lipeng Wan,et al.  SSD-optimized workload placement with adaptive learning and classification in HPC environments , 2014, 2014 30th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10]  Wenguang Wang,et al.  Storage management for large scale systems , 2004 .

[11]  Devarshi Ghoshal,et al.  Data Jockey: Automatic Data Management for HPC Multi-tiered Storage Systems , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[12]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[13]  Suren Byna,et al.  Toward Transparent Data Management in Multi-Layer Storage Hierarchy of HPC Systems , 2018, 2018 IEEE International Conference on Cloud Engineering (IC2E).

[14]  Xuelong Li,et al.  An investigation for loss functions widely used in machine learning , 2018, Commun. Inf. Syst..

[15]  Kevin Harms,et al.  UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis , 2017, PDSW-DISCS@SC.

[16]  Stéphane Rubini,et al.  A multi-level I/O tracer for timing and performance storage systems in IaaS cloud , 2014, REACTION.

[17]  Cory Stephenson,et al.  A Comparison of Loss Weighting Strategies for Multi task Learning in Deep Neural Networks , 2019, IEEE Access.

[18]  Richard Todd Evans Democratizing Parallel Filesystem Monitoring , 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER).

[19]  Thomas M. Coughlin New storage hierarchy for consumer computers , 2011, 2011 IEEE International Conference on Consumer Electronics (ICCE).

[20]  Sergio Escalera,et al.  Beyond One-hot Encoding: lower dimensional target embedding , 2018, Image Vis. Comput..

[21]  Thomas Ludwig,et al.  Survey of Storage Systems for High-Performance Computing , 2018, Supercomput. Front. Innov..

[22]  Konstantin Berlin,et al.  eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys , 2017, ArXiv.

[23]  Heiner Litz,et al.  Learning I/O Access Patterns to Improve Prefetching in SSDs , 2020, ECML/PKDD.

[24]  Florent Monjalet,et al.  Predicting File Lifetimes with Machine Learning , 2019, ISC Workshops.

[25]  Renhai Chen,et al.  Emerging NVM: A Survey on Architectural Integration and Research Challenges , 2017, TODE.

[26]  Olivier Barais,et al.  Investigating Machine Learning Algorithms for Modeling SSD I/O Performance for Container-Based Virtualization , 2019, IEEE Transactions on Cloud Computing.

[27]  Liang Liang,et al.  Archivist: A Machine Learning Assisted Data Placement Mechanism for Hybrid Storage Systems , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[28]  Laurent Lemarchand,et al.  Optimizing the cost of DBaaS object placement in hybrid storage systems , 2019, Future Gener. Comput. Syst..

[29]  Thomas Leibovici,et al.  Taking back control of HPC file systems with Robinhood Policy Engine , 2015, ArXiv.

[30]  Cea The CEA: a key player in technological research , 2016 .