SP-TSRM: A Data Grouping Strategy in Distributed Storage System

With the development of smart devices and social media, massive unstructured data is uploaded to distributed storage systems. Since the characteristics of multi-users and high concurrency the unstructured data accesses have, it brings new challenges to traditional distributed storage systems designed for large files. We propose a grouping strategy to analyze relevant data in access according to disk access logs in the real distributed storage systems environment. When any data in the group is accessed, the whole group is prefetched from disk to the cache. Firstly, we conduct statistical analysis on the access logs and propose a preliminary classification method to classify files in spatiotemporal locality. Secondly, a strength-priority tree structure relation model (SP-TSRM) is proposed to mine file group efficiently. Finally, experiments show that the proposed model can improve the cache hit rate significantly, thereby improving the read efficiency of distributed storage systems.

[1]  Darrell D. E. Long,et al.  Exploring the Bounds of Web Latency Reduction from Caching and Prefetching , 1997, USENIX Symposium on Internet Technologies and Systems.

[2]  Hong Jiang,et al.  AMP: An Affinity-Based Metadata Prefetching Scheme in Large-Scale Distributed Storage Systems , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[3]  Yusik Kim,et al.  Data Prefetching for Large Tiered Storage Systems , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[4]  Xingjun Zhang,et al.  Data De-duplication on Similar File Detection , 2014, 2014 Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[5]  Fu Zhongchuan,et al.  An Access Prefetching Strategy for Accessing Small Files Based on Swift , 2018 .

[6]  Kyoung Soo Bok,et al.  An efficient cache management scheme for accessing small files in Distributed File Systems , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[7]  Ethan L. Miller,et al.  Can We Group Storage? Statistical Techniques to Identify Predictive Groupings in Storage System Accesses , 2016, TOS.

[8]  Qinghua Zheng,et al.  An optimized approach for storing and accessing small files on cloud storage , 2012, J. Netw. Comput. Appl..

[9]  Xin Wang,et al.  QuickSync: Improving Synchronization Efficiency for Mobile Cloud Storage Services , 2017, IEEE Transactions on Mobile Computing.

[10]  Qinghua Zheng,et al.  A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files , 2010, 2010 IEEE International Conference on Services Computing.