In the era of Big Data, one single directory can contain tens of thousands or even millions of files. Directory update methods employed by traditional POSIX-compliant file systems do not adapt well to wide-area network and could lead to large amount of unnecessary bandwidth consumption due to their full cache invalidation approaches. In traditional file systems, if there is a small fraction of changes in a directory (e.g. Renaming a file), the whole cache of the directory metadata must be discarded and a new copy will be obtained from server, resulting in poor performance in low bandwidth environment. In this paper, we propose a directory metadata update strategy, which partitions metadata into blocks, and only transfers modified block(s) over the network to reduce transmission time. We implement a proof-of-concept prototype using the FUSE user space file system to verify the effectiveness of our approach. Results show that for a directory with directory entries size of 5MB, update time for small changes in a directory can be roughly reduced by a factor of 20.
[1]
Sean Quinlan,et al.
Venti: A New Approach to Archival Storage
,
2002,
FAST.
[2]
Mahadev Satyanarayanan,et al.
Opportunistic Use of Content Addressable Storage for Distributed File Systems
,
2003,
USENIX Annual Technical Conference, General Track.
[3]
Dutch T. Meyer,et al.
A study of practical deduplication
,
2011,
TOS.
[4]
Jacob R. Lorch,et al.
Farsite: federated, available, and reliable storage for an incompletely trusted environment
,
2002,
OSDI '02.
[5]
Siddhartha Annapureddy,et al.
Shark: scaling file servers via cooperative caching
,
2005,
NSDI.
[6]
André Brinkmann,et al.
A study on data deduplication in HPC storage systems
,
2012,
2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[7]
Robert Tappan Morris,et al.
Flexible, Wide-Area Storage for Distributed Systems with WheelFS
,
2009,
NSDI.