A client-side directory prefetching mechanism for GlusterFS

Distributed file system has the characteristics of large capacity, good scalability and high reliability, which make it widely used in many areas involving large-scale data storage. It offers simplified, highly-available services for users to access data. However, due to the non-metadata design, the performance of traversal operation on large directories in those non-metadata distributed file systems is poor. With the increasing amount of files, it severely affects the user experience. In this paper, we present a directory prefetching mechanism on the client side to reduce directory traversal operation latency in non-metadata distributed file system. The mechanism, combined with the client's cache, adopts the directory access history to predict future access pattern and fetches the content of the directory without user intervention. Our goal is to reduce the overall access latency in the non-metadata distributed file system in order to better satisfy the user experience.

[1]  E. L. Miller,et al.  Efficient Metadata Management in Large Distributed File Systems , .

[2]  Hong Jiang,et al.  Improving Storage Availability in Cloud-of-Clouds with Hybrid Redundant Data Distribution , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[3]  Scott A. Brandt,et al.  Efficient metadata management in large distributed storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[4]  Qinghua Zheng,et al.  A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files , 2010, 2010 IEEE International Conference on Services Computing.

[5]  Xie Tao,et al.  Small file access optimization based on GlusterFS , 2014, Proceedings of 2014 International Conference on Cloud Computing and Internet of Things.

[6]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[7]  Toshihiko Yamakami An Exploratory Analysis on User Behavior Regularity in the Mobile Internet , 2006, KES.

[8]  Alex Davies,et al.  Scale out with GlusterFS , 2013 .

[9]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[10]  Christoph Biardzki,et al.  Analyzing Metadata Performance in Distributed File Systems , 2009, PaCT.

[11]  Michael Anthony Bauer,et al.  A Data Management in a Private Cloud Storage Environment Utilizing High Performance Distributed File Systems , 2013, 2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises.

[12]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[13]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[14]  Yin Jianwei Dynamic Load Balancing Algorithm of Distributed File System , 2011 .

[15]  GhemawatSanjay,et al.  The Google file system , 2003 .

[16]  Dhabaleswar K. Panda,et al.  IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand , 2008, 2008 37th International Conference on Parallel Processing.