Efficient dentry lookup with backward finding mechanism

As modern computer systems face the challenge of managing large data, filesystems must deal with a large number of files. This leads to amplified concerns of metadata and data operations. Filesystems in Linux manage the metadata of files by constructing in-memory structures such as directory entry (dentry) and inode. However, we found inefficiencies in metadata management mechanisms, especially in the path traversal mechanism of Linux file systems when searching for a dentry in the dentry cache. In this paper, we optimize metadata operations of path traversing by searching for the dentry in the backward manner. By using the backward finding mechanism, we can find the target dentry with reduced number of dentry cache lookups when compared with the original forward finding mechanism. However, this backward path lookup mechanism complicates permission guarantee of each path component. We addess this issue by proposing the use of a permission-granted list. We have evaluated our optimized techniques with several benchmarks including real-world workload. The experimental results show that our optimizations improve path lookup latency by up to 40% and overall throughput by up to 56% in real-world benchmarks which has a number of path-deepen files.

[1]  Wei Zhang,et al.  GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[2]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[3]  Jun Yang,et al.  Fine-grained metadata journaling on NVM , 2016, 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST).

[4]  Brent B. Welch,et al.  A Comparison of Three Distributed File System Architectures: Vnode, Sprite, and Plan 9 , 1994, Comput. Syst..

[5]  Jun Yang,et al.  Extending SSD Lifetime with Persistent In-Memory Metadata Management , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[6]  Michael A. Bender,et al.  Optimizing Every Operation in a Write-optimized File System , 2016, USENIX Annual Technical Conference.

[7]  GhemawatSanjay,et al.  The Google file system , 2003 .

[8]  Gabriel Antoniu,et al.  Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows , 2015, 2015 IEEE International Conference on Cluster Computing.

[9]  Dan Duchamp Optimistic Lookup of Whole NFS Paths in a Single Operation , 1994, USENIX Summer.

[10]  André Brinkmann,et al.  File System Scalability with Highly Decentralized Metadata on Independent Storage Devices , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[11]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[12]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[13]  Michael A. Bender,et al.  BetrFS: Write-Optimization in a Kernel File System , 2015, ACM Trans. Storage.

[14]  Brent Welch,et al.  Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[16]  Tao Zhang,et al.  How to get more value from your file system directory cache , 2015, SOSP.

[17]  Kai Ren,et al.  TABLEFS: Enhancing Metadata Efficiency in the Local File System , 2013, USENIX Annual Technical Conference.

[18]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[19]  Hong Jiang,et al.  Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[20]  André Brinkmann,et al.  Direct lookup and hash-based metadata placement for local file systems , 2013, SYSTOR '13.

[21]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[22]  Thomas Ludwig,et al.  Evaluating Lustre's Metadata Server on a Multi-Socket Platform , 2014, 2014 9th Parallel Data Storage Workshop.

[23]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Kimberly Keeton,et al.  From research to practice: experiences engineering a production metadata database for a scale out file system , 2014, FAST.

[25]  An-I Wang,et al.  The Composite-file File System: Decoupling the One-to-One Mapping of Files and Metadata for Better Performance , 2016, FAST.

[26]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[27]  Michael A. Bender,et al.  The TokuFS Streaming File System , 2012, HotStorage.

[28]  Kai Ren,et al.  A Case for Scaling HPC Metadata Performance through De-specialization , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[29]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).