Explicit Data Correlations-Directed Metadata Prefetching Method in Distributed File Systems

Metadata performance in distributed file systems (DFS) is critical, due to the following trends: (a) the growing size of modern storage systems is expected to exceed billions of files and most files are small; (b) over half of the file accesses are metadata operations. In this work, we present SMeta, a metadata prefetching method that is seamlessly integrated into DFS for easy-of-use and significantly scales the metadata performance. Previous prefetching proposals primarily focus on mining groups of files that tend to be accessed together from the access history. Nevertheless, our study discovered that these solutions likely miss a huge number of correlated files whose co-occurrence frequency is not high enough. Unlike access correlations, we take a novel and completely different approach to explore explicit data correlations by understanding the reference relationships between files encoded in some forms of hyperlinks, which naturally exist in many applications. To embrace this new concept, SMeta explores correlations upon files are written via a light-weight pattern matching algorithm, stores correlations in the reserved extended attributes of file metadata to avoid changes in DFS APIs, and collapses multiple I/O rounds for accessing metadata of the target file and its data-correlated files into one round. A cost-efficient adaptive feedback mechanism is introduced to improve prefetching accuracy. We implemented SMeta atop of Ceph and evaluated it using synthetic and real system workloads. Compared to baselines, SMeta provides better metadata performance in terms of latency, throughput and scalability.

[1]  Geoffrey H. Kuenning,et al.  An Analysis of Trace Data for Predictive File Caching in Mobile Computing , 1994, USENIX Summer.

[2]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[3]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[4]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[5]  Darrell D. E. Long,et al.  Design and Implementation of a Predictive File Prefetching Algorithm , 2001, USENIX Annual Technical Conference, General Track.

[6]  Gregory R. Ganger,et al.  Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics , 2002, FAST.

[7]  GhemawatSanjay,et al.  The Google file system , 2003 .

[8]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[9]  Yuanyuan Zhou,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[10]  Aamer Sachedina,et al.  Second-tier cache management using write hints , 2005, FAST'05.

[11]  Kenneth Salem,et al.  Optimization of query streams using semantic prefetching , 2005, TODS.

[12]  Alan Jay Smith,et al.  The automatic improvement of locality in storage systems , 2005, TOCS.

[13]  Barbara M. Chapman,et al.  OpenMP , 2005, Parallel Comput..

[14]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[15]  Scott A. Brandt,et al.  Ceph: reliable, scalable, and high-performance distributed storage , 2007 .

[16]  Xiaoning Ding,et al.  DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch , 2007, USENIX Annual Technical Conference.

[17]  Weisong Shi,et al.  FlexFetch: A History-Aware Scheme for I/O Energy Saving in Mobile Computing , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[18]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[19]  Madalin Mihailescu,et al.  Context-Aware Prefetching at the Storage Server , 2008, USENIX Annual Technical Conference.

[20]  Surendra Byna,et al.  Hiding I/O latency with pre-execution prefetching for parallel applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Mahmut T. Kandemir,et al.  Profiler and compiler assisted adaptive I/O prefetching for shared storage caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Robert B. Ross,et al.  Small-file access in parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[23]  Hong Jiang,et al.  SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[24]  Eric Anderson,et al.  Capture, Conversion, and Analysis of an Intense NFS Workload , 2009, FAST.

[25]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[26]  Hong Jiang,et al.  A Novel Weighted-Graph-Based Grouping Algorithm for Metadata Prefetching , 2010, IEEE Transactions on Computers.

[27]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[28]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[29]  Houjun Tang,et al.  Improving Read Performance with Online Access Pattern Analysis and Prefetching , 2014, Euro-Par.

[30]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[32]  Lin Xiao,et al.  ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems , 2015, SoCC.

[33]  Carlos Maltzahn,et al.  Mantle: a programmable metadata load balancer for the ceph file system , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[34]  Heonshik Shin,et al.  ClusterFetch: A Lightweight Prefetcher for General Workloads , 2015, ICPE.

[35]  An-I Wang,et al.  The Composite-file File System: Decoupling the One-to-One Mapping of Files and Metadata for Better Performance , 2016, FAST.

[36]  Yutaka Ishikawa,et al.  Prefetching on Storage Servers through Mining Access Patterns on Blocks , 2016, IEEE Transactions on Parallel and Distributed Systems.

[37]  Seif Haridi,et al.  HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases , 2016, FAST.

[38]  Siwoon Son,et al.  Prefetching-based metadata management in Advanced Multitenant Hadoop , 2017, The Journal of Supercomputing.

[39]  Youyou Lu,et al.  LocoFS: A Loosely-Coupled Metadata Service for Distributed File Systems , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[40]  Erez Zadok,et al.  To FUSE or Not to FUSE: Performance of User-Space File Systems , 2017, FAST.

[41]  M. Jones Ceph : A Linux petabyte-scale distributed file system Exploring the Ceph file system and ecosystem , 2018 .

[42]  Fan Guo,et al.  Scaling Embedded In-Situ Indexing with DeltaFS , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.