FlexGroup Volumes: A Distributed WAFL File System

The rapid growth of customer applications and datasets has led to demand for storage that can scale with the needs of modern workloads. We have developed FlexGroup volumes to meet this need. FlexGroups combine local WAFL® file systems in a distributed storage cluster to provide a single namespace that seamlessly scales across the aggregate resources of the cluster (CPU, storage, etc.) while preserving the features and robustness of the WAFL file system. In this paper we present the FlexGroup design, which includes a new remote access layer that supports distributed transactions and the novel heuristics used to balance load and capacity across a storage cluster. We evaluate FlexGroup performance and efficacy through lab tests and field data from over 1,000 customer FlexGroups.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  Zach Brown,et al.  Chunkfs: Using Divide-and-Conquer to Improve File System Reliability and Repair , 2006, HotDep.

[3]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1988, TOCS.

[4]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[5]  John K. Ousterhout,et al.  Prefix Tables: A Simple Mechanism for Locating Files in a Distributed System , 1985, ICDCS.

[6]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7]  Harendra Kumar,et al.  WAFL Iron: Repairing Live Enterprise File Systems , 2018, FAST.

[8]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[9]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[10]  Dror G. Feitelson,et al.  The Vesta parallel file system , 1996, TOCS.

[11]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[12]  Tianyu Wo,et al.  SpanFS: A Scalable File System on Fast Storage Devices , 2015, USENIX Annual Technical Conference.

[13]  Vinay Devadas,et al.  To Waffinity and Beyond: A Scalable Architecture for Incremental Parallelization of File System Code , 2016, OSDI.

[14]  Yuvraj Patel,et al.  Efficient Free Space Reclamation in WAFL , 2017, ACM Trans. Storage.

[15]  Butler W. Lampson,et al.  A New Presumed Commit Optimization for Two Phase Commit , 1993, VLDB.

[16]  Peter F. Corbett,et al.  RAID triple parity , 2012, OPSR.

[17]  Peter Corbett,et al.  Data ONTAP GX: A Scalable Storage Cluster , 2007, FAST.

[18]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[19]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[20]  Jon Howell,et al.  Distributed directory service in the Farsite file system , 2006, OSDI '06.

[21]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[22]  Carlos Maltzahn,et al.  Mantle: a programmable metadata load balancer for the ceph file system , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Margo I. Seltzer,et al.  Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems , 2000, USENIX Annual Technical Conference, General Track.

[24]  Yuvraj Patel,et al.  Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL , 2017, FAST.

[25]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[26]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[27]  Harendra Kumar,et al.  High Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System , 2017, FAST.

[28]  Steve R. Kleiman,et al.  SnapMirror: File-System-Based Asynchronous Mirroring for Disaster Recovery , 2002, FAST.

[29]  Alan Rowe,et al.  Measuring Real-World Data Availability , 2001, LISA.

[30]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[31]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.