Using Lightweight Transactions and Snapshots for Fault-Tolerant Services Based on Shared Storage Bricks

To satisfy current and future application needs in a cost effective manner, storage systems are evolving from monolithic disk arrays to networked storage architectures based on commodity components. So far, this architectural transition has mostly been envisioned as a way to scale capacity and performance. In this work we examine how the block-level interface exported by such networked storage systems can be extended to deal with reliability. Our goals are: (a) At the design level, to examine how strong reliability semantics can be offered at the block level; (b) At the implementation level, to examine the mechanisms required and how they may be provided in a modular and configurable manner. We first discuss how transactional-type semantics may be offered at the block level. We present a system design that uses the concept of atomic update intervals combined with existing, block-level locking and snapshot mechanisms, in contrast to the more common journaling techniques. We discuss in detail the design of the associated mechanisms and the trade-offs and challenges when dividing the required functionality between the file-system and the block-level storage. Our approach is based on a unified and thus, non-redundant set of mechanisms for providing reliability both at the block and file level. Our design and implementation effectively provide a tunable, lightweight transactions mechanism to higher system and application layers. Finally, we describe how the associated protocols can be implemented in a modular way in a prototype storage system we are currently building. As our system is currently being implemented, we do not present performance results

[1]  Gregory R. Ganger,et al.  Ursa minor: versatile cluster-based storage , 2005, FAST'05.

[2]  Renaud Lachaize,et al.  Simplifying administration through dynamic reconfiguration. in a cooperative cluster storage system , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[3]  Gregory R. Ganger,et al.  Dynamic Function Placement for Data-Intensive Cluster Computing , 2000, USENIX Annual Technical Conference, General Track.

[4]  Robert Grimm,et al.  Atomic recovery units: failure atomicity for logical disks , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[5]  John H. Hartman,et al.  The Swarm scalable storage system , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[6]  A. Bilas,et al.  Shared & Flexible Block I / O for Cluster-Based Storage , 2006 .

[7]  Barry Phillips,et al.  Have Storage Area Networks Come of Age? , 1998, Computer.

[8]  Steven J. Vaughan-Nichols,et al.  Tempest over web-authoring tools , 2001, Computer.

[9]  Kanishk Jain Object-based Storage , 2022 .

[10]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1998, IEEE Trans. Parallel Distributed Syst..

[11]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[12]  Angelos Bilas,et al.  Violin: a framework for extensible block-level storage , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[13]  Robert M. Rees,et al.  IBM Storage Tank - A heterogeneous scalable SAN file system , 2003, IBM Syst. J..

[14]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[15]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[16]  Edward W. Felten,et al.  Simplifying Distributed File Systems Using a Shared Logical Disk , 1996 .

[17]  Erez Zadok,et al.  FIST: a language for stackable file systems , 2000, OPSR.

[18]  Katherine Guo,et al.  Scalability of the microsoft cluster service , 1998 .

[19]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[20]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[21]  Jim Gray Storage Bricks Have Arrived , 2002 .

[22]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[23]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[24]  Garth A. Gibson,et al.  Highly concurrent shared storage , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[25]  Richard A. Golding,et al.  The design and evaluation of network RAID protocols , 2004 .