A Metadata Update Strategy for Large Directories in Wide-Area File Systems

In the era of Big Data, one single directory can contain tens of thousands or even millions of files. Directory update methods employed by traditional POSIX-compliant file systems do not adapt well to wide-area network and could lead to large amount of unnecessary bandwidth consumption due to their full cache invalidation approaches. In traditional file systems, if there is a small fraction of changes in a directory (e.g. Renaming a file), the whole cache of the directory metadata must be discarded and a new copy will be obtained from server, resulting in poor performance in low bandwidth environment. In this paper, we propose a directory metadata update strategy, which partitions metadata into blocks, and only transfers modified block(s) over the network to reduce transmission time. We implement a proof-of-concept prototype using the FUSE user space file system to verify the effectiveness of our approach. Results show that for a directory with directory entries size of 5MB, update time for small changes in a directory can be roughly reduced by a factor of 20.