Archivist: A Machine Learning Assisted Data Placement Mechanism for Hybrid Storage Systems

With the rapid growth of edge-cloud computing, emerging applications pose higher performance demand on the storage system for storing massive data that are generated from various sources. The multi-sourced data shows different properties in size, retention time, and read/write frequency. Hybrid storage system is promised to efficiently handle the data in edge-cloud computing environment satisfying different data demands. The key problem is how to place the data on the hybrid storage system according to the run-time status and the properties of both data and the storage systems. In this paper, we propose Archivist — a machine learning assisted data placement mechanism for hybrid storage systems to reduce file access latency. We first design a machine learning based approach for predicting the access patterns of the incoming data. Then, we present a data placement algorithm to optimize the data on the hybrid storage mediums by matching the properties of data and the features of storage mediums. Extensive experimental results show that Archivist can achieve up to 49% improvement of system performance for file accesses compared with baseline.