Data Prefetching for Large Tiered Storage Systems

In multi-tier storage systems with large amounts of data, most of the data is stored on inexpensive slower tiers such as cloud or tape to achieve cost savings. This also implies that retrieving the data from the slower storage tiers incurs high latency. Therefore, it would be beneficial to proactively prefetch data from slower tiers to faster tiers by predicting future data accesses. State-of-the-art access prediction methods typically record access history of individual files, data objects, or data segments. However, in systems with large amounts of infrequently accessed (or cold) data, file-level access history is often unavailable for much of the data due to the low frequency of access. In this paper, we extract information from file metadata to predict file accesses in a storage system. The proposed method relies on the hypothesis that users and applications access data stored in the system in a given context and that the context and, therefore, the set of files that are likely to be accessed can be identified by detecting access patterns in file metadata. As an application, we consider the LOFAR radio telescope's long term archive, where the access patterns are learned based on a rich set of metadata, and these patterns are then used to make predictions as to likely future accesses by the astronomers.

[1]  Yang Liu,et al.  Software defined just-in-time caching in an enterprise storage system , 2014, IBM J. Res. Dev..

[2]  Madalin Mihailescu,et al.  Context-Aware Prefetching at the Storage Server , 2008, USENIX Annual Technical Conference.

[3]  Hong Jiang,et al.  AMP: An Affinity-Based Metadata Prefetching Scheme in Large-Scale Distributed Storage Systems , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[4]  Nimrod Megiddo,et al.  Outperforming LRU with an adaptive replacement cache algorithm , 2004, Computer.

[5]  Song Jiang,et al.  Advanced Operating Systems and Kernel Applications: Techniques and Technologies , 2009 .

[6]  Christopher Small,et al.  Why does file system prefetching work? , 1999, USENIX Annual Technical Conference, General Track.

[7]  Giovanni Cherubini,et al.  Cognitive Storage for Big Data , 2016, Computer.

[8]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[9]  M. P. van Haarlem,et al.  LOFAR: The Low Frequency Array , 2005 .