Orchestra: Extensible Block-Level Support for Resource and Data Sharing in Networked Storage Systems

High-performance storage systems are evolving towards decentralized commodity clusters that can scale in capacity, processing power, and network throughput. Building such systems requires: (a)Sharing physical resources among applications; (b)Sharing data among applications; (c) Allowing customized views of data for applications. Current solutions satisfy typically the first two requirements through a distributed file-system, resulting in monolithic, hard-to-manage storage systems. In this paper, we present Orchestra, a novel storage system that addresses all three above requirements below the file-system by extending the block layer. To provide customized views, Orchestra allows applications to create semantically-rich virtual block devices by combining simpler ones. To achieve efficient resource and data sharing it supports block-level allocation and byte-range locking as in-band mechanisms. We implement Orchestra under Linux and use it to build a shared cluster file-system. We evaluate it on a 16-node cluster, finding that the flexibility offered by Orchestra introduces little overhead beyond mandatory communication and disk access costs.

[1]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[2]  Richard A. Golding,et al.  D-SPTF: decentralized request distribution in brick-based storage systems , 2004, ASPLOS XI.

[3]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[4]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[5]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[6]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[7]  Gregory R. Ganger,et al.  Ursa minor: versatile cluster-based storage , 2005, FAST'05.

[8]  Grant Erickson,et al.  A 64-bit, shared disk file system for Linux , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[9]  Garth A. Gibson,et al.  Highly concurrent shared storage , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[10]  Jim Gray Storage Bricks Have Arrived , 2002 .

[11]  Angelos Bilas,et al.  Clotho: Transparent Data Versioning at the Block I/O Level , 2004, MSST.

[12]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[13]  Nancy P. Kronenberg,et al.  VAXcluster: a closely-coupled distributed system , 1986, TOCS.

[14]  Barry Phillips,et al.  Have Storage Area Networks Come of Age? , 1998, Computer.

[15]  Steven J. Vaughan-Nichols,et al.  Tempest over web-authoring tools , 2001, Computer.

[16]  Angelos Bilas,et al.  Using Lightweight Transactions and Snapshots for Fault-Tolerant Services Based on Shared Storage Bricks , 2006, 2006 IEEE International Conference on Cluster Computing.

[17]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[18]  Edward W. Felten,et al.  Simplifying Distributed File Systems Using a Shared Logical Disk , 1996 .

[19]  Angelos Bilas,et al.  Violin: a framework for extensible block-level storage , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).