Extensible block-level storage virtualization in cluster-based systems

High-performance storage systems are evolving towards decentralized commodity clusters that can scale in capacity, processing power, and network throughput. Building such systems requires: (a) Sharing physical resources among applications; (b) Sharing data among applications; (c) Allowing customized data views. Current solutions typically satisfy the first two requirements through a cluster file-system, resulting in monolithic, hard-to-manage systems. In this paper we present a storage system that addresses all three requirements by extending the block layer below the file-system. First, we discuss how our system provides customized (virtualized) storage views within a single node. Then, we discuss how it scales in clustered setups. To achieve efficient resource and data sharing we support block-level allocation and locking as in-band mechanisms. We implement a prototype under Linux and use it to build a shared cluster file-system. Our evaluation in a 24-node cluster setup concludes that our approach offers flexibility, scalability and reduced effort to implement new functionality.

[1]  John S. Heidemann,et al.  File-system development with stackable layers , 1994, TOCS.

[2]  Garth A. Gibson,et al.  Parity logging disk arrays , 1994, TOCS.

[3]  Brian D. Noble,et al.  When Virtual Is Better Than Real , 2001 .

[4]  Jehoshua Bruck,et al.  Computing in the RAIN: A Reliable Array of Independent Nodes , 2000, IPDPS Workshops.

[5]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[6]  Alexander A. Stepanov,et al.  Loge: A Self-Organizing Disk Controller , 1991 .

[7]  Jim Zelenka,et al.  RAIDframe: rapid prototyping for disk arrays , 1996, SIGMETRICS '96.

[8]  Angelos Bilas,et al.  Using Lightweight Transactions and Snapshots for Fault-Tolerant Services Based on Shared Storage Bricks , 2006, 2006 IEEE International Conference on Cluster Computing.

[9]  David Teigland,et al.  Volume Managers in Linux , 2001, USENIX Annual Technical Conference, FREENIX Track.

[10]  Kimberly Keeton,et al.  Automatic design of dependable data storage systems , 2003 .

[11]  Matthew T. O'Keefe,et al.  Scalability and Failure Recovery in a Linux Cluster File System , 2000, Annual Linux Showcase & Conference.

[12]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[13]  Lex Stein Stupid File Systems Are Better , 2005, HotOS.

[14]  EDDIE KOHLER,et al.  The click modular router , 2000, TOCS.

[15]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[16]  Greg Lehey The Vinum Volume Manager , 1999, USENIX Annual Technical Conference, FREENIX Track.

[17]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.

[18]  Nancy P. Kronenberg,et al.  VAXcluster: a closely-coupled distributed system , 1986, TOCS.

[19]  Sean Matthew Dorward,et al.  Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .

[20]  Barry Phillips,et al.  Have Storage Area Networks Come of Age? , 1998, Computer.

[21]  Richard A. Golding,et al.  D-SPTF: decentralized request distribution in brick-based storage systems , 2004, ASPLOS XI.

[22]  Paul W. Schermerhorn,et al.  USENIX Association Proceedings of the FREENIX Track : 2001 USENIX Annual , 2001 .

[23]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[24]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[25]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[26]  John Wilkes,et al.  Traveling to Rome: QoS Specifications for Automated Storage System Management , 2001, IWQoS.

[27]  Gregory R. Ganger,et al.  Ursa minor: versatile cluster-based storage , 2005, FAST'05.

[28]  Grant Erickson,et al.  A 64-bit, shared disk file system for Linux , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[29]  Erez Zadok,et al.  FIST: a language for stackable file systems , 2000, OPSR.

[30]  Larry L. Peterson,et al.  A dynamic network architecture , 1992, TOCS.

[31]  Jim Gray Storage Bricks Have Arrived , 2002 .

[32]  Angelos Bilas,et al.  Orchestra: Extensible Block-Level Support for Resource and Data Sharing in Networked Storage Systems , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[33]  David A. Patterson,et al.  Virtual log based file systems for a programmable disk , 1999, OSDI '99.

[34]  Jehoshua Bruck,et al.  Computing in the RAIN: A Reliable Array of Independent Nodes , 2000, IEEE Trans. Parallel Distributed Syst..

[35]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[36]  Andrew Warfield,et al.  Parallax: Managing Storage for a Million Machines , 2005, HotOS.

[37]  Roy Friedman,et al.  A framework for protocol composition in Horus , 1995, PODC '95.

[38]  Andrea C. Arpaci-Dusseau,et al.  Semantically-Smart Disk Systems , 2003, FAST.

[39]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[40]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[41]  Garth A. Gibson,et al.  Highly concurrent shared storage , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[42]  Richard A. Golding,et al.  The design and evaluation of network RAID protocols , 2004 .

[43]  Angelos Bilas,et al.  Clotho: Transparent Data Versioning at the Block I/O Level , 2004, MSST.

[44]  Miguel de Icaza,et al.  Kernel Korner: The New Linux RAID Code , 1997 .

[45]  Wilson C. Hsieh,et al.  The logical disk: a new approach to improving file systems , 1994, SOSP '93.

[46]  Andrea C. Arpaci-Dusseau,et al.  Proceedings of the 2002 Usenix Annual Technical Conference Bridging the Information Gap in Storage Protocol Stacks , 2022 .

[47]  Angelos Bilas,et al.  Violin: a framework for extensible block-level storage , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[48]  Robert M. Rees,et al.  IBM Storage Tank - A heterogeneous scalable SAN file system , 2003, IBM Syst. J..

[49]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[50]  Larry L. Peterson,et al.  Making paths explicit in the Scout operating system , 1996, OSDI '96.

[51]  S. Gribble,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[52]  G. C. Wong,et al.  "Stacking/" Vnodes: A Progress Report , 1993, USENIX Summer.

[53]  Walter Oney Programming the Microsoft Windows Driver Model, Second Edition , 2002 .

[54]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[55]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.