Design and Implementation of an Asymmetric Block-Based Parallel File System

Existing block-based parallel file systems, which are deployed in the storage area network (SAN), blend metadata with data in underlying disks. Unfortunately, such symmetric architecture is prone to system-level failures, as metadata on shared disks can be damaged by a malfunctioning client. In this paper, we present an asymmetric block-based parallel file system, Redbud, which isolates the metadata storage in the metadata server (MDS) access domain. Although centralized metadata management can effectively improve the reliability of the system, it faces some challenges in providing high performance and availability. Towards this end, we introduce an embedded directory mechanism to explore the disk bandwidth of the metadata storage; we also introduces adaptive layout operations to deliver high I/O throughput for various file access pattern. Besides, by taking the MDS's load into consideration, we propose an adaptive timeout algorithm to make the MDS failure detection adaptive to the evolving workloads, improving the system availability. Measurements of a wide range of workloads demonstrate the benefit of our design and that Redbud gains good scalability.

[1]  Feng Wang,et al.  File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .

[2]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[3]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[4]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[5]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[6]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[7]  Hongsheng Xi,et al.  On the design of a new Linux readahead framework , 2008, OPSR.

[8]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[9]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[10]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[11]  George C. Necula,et al.  SafeDrive: safe and recoverable extensions using language-based techniques , 2006, OSDI '06.

[12]  Michael Burrows,et al.  Performance of Firefly RPC , 1989, SOSP '89.

[13]  Andrew Birrell,et al.  Implementing remote procedure calls , 1984, TOCS.

[14]  Erez Zadok,et al.  Type-safe disks , 2006, OSDI '06.

[15]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[16]  Brent Welch POSIX IO extensions for HPC , 2005 .

[17]  David Robinson,et al.  NFS version 4 Protocol , 2000, RFC.

[18]  GhemawatSanjay,et al.  The Google file system , 2003 .

[19]  Asim Kadav,et al.  Tolerating hardware device failures in software , 2009, SOSP '09.

[20]  Eric Barton,et al.  A Novel network request scheduler for a large scale storage system , 2009, Computer Science - Research and Development.