SafeStore: A Durable and Practical Storage System

This paper presents SafeStore, a distributed storage system designed to maintain long-term data durability despite conventional hardware and software faults, environmental disruptions, and administrative failures caused by human error or malice. The architecture of SafeStore is based on fault isolation, which Safe-Store applies aggressively along administrative, physical, and temporal dimensions by spreading data across autonomous storage service providers (SSPs). However, current storage interfaces provided by SSPs are not designed for high end-to-end durability. In this paper, we propose a new storage system architecture that (1) spreads data efficiently across autonomous SSPs using informed hierarchical erasure coding that, for a given replication cost, provides several additional 9's of durability over what can be achieved with existing black-box SSP interfaces, (2) performs an efficient end-to-end audit of SSPs to detect data loss that, for a 20% cost increase, improves data durability by two 9's by reducing MTTR, and (3) offers durable storage with cost, performance, and availability competitive with traditional storage systems. We instantiate and evaluate these ideas by building a SafeStore-based file system with an NFS-like interface.

[1]  Pradeep K. Khosla,et al.  Survivable Information Storage Systems , 2000, Computer.

[2]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[3]  Eric Anderson,et al.  A backup appliance composed of high-capacity disk drives , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[4]  Michael Dahlin,et al.  BAR fault tolerance for cooperative services , 2005, SOSP '05.

[5]  Geoffrey M. Voelker,et al.  Surviving Internet Catastrophes , 2005, USENIX Annual Technical Conference, General Track.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Vicky Reich,et al.  Requirements for Digital Preservation Systems: A Bottom-Up Approach , 2005, D Lib Mag..

[8]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[9]  Spencer W. Ng,et al.  Disk scrubbing in large archival storage systems , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[10]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[11]  Mary Baker,et al.  The LOCKSS peer-to-peer digital preservation system , 2005, TOCS.

[12]  Stanislaw Jarecki,et al.  Cryptographic Primitives Enforcing Communication and Storage Complexity , 2002, Financial Cryptography.

[13]  Ben Y. Zhao,et al.  Awarded Best Student Paper! - Pond: The OceanStore Prototype , 2003 .

[14]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[15]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[16]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[17]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[18]  Brian D. Noble,et al.  Samsara: honor among thieves in peer-to-peer storage , 2003, SOSP '03.

[19]  Erez Zadok,et al.  A Versatile and User-Oriented Versioning File System , 2004, FAST.

[20]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[21]  William Yurcik,et al.  The evolution of storage service providers: techniques and challenges to outsourcing storage , 2005, StorageSS '05.

[22]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[23]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[24]  Miguel Castro,et al.  BASE: Using abstraction to improve fault tolerance , 2003, TOCS.

[25]  Mary Baker,et al.  A fresh look at the reliability of long-term digital storage , 2005, EuroSys.

[26]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[27]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[28]  Craig A. N. Soules,et al.  Metadata Efficiency in Versioning File Systems , 2003, FAST.

[29]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[30]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[31]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[32]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[33]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[34]  Norman C. Hutchinson,et al.  Deciding when to forget in the Elephant file system , 1999, SOSP.

[35]  RosenblumMendel,et al.  The design and implementation of a log-structured file system , 1991 .

[36]  Andrew V. Goldberg,et al.  A prototype implementation of archival Intermemory , 1999, DL '99.

[37]  Ethan L. Miller,et al.  Disk infant mortality in large storage systems , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[38]  Luigi Rizzo,et al.  Effective erasure codes for reliable computer communication protocols , 1997, CCRV.

[39]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[40]  Miguel Castro,et al.  Using abstraction to improve fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[41]  Miguel Castro,et al.  BASE: using abstraction to improve fault tolerance , 2001, SOSP.

[42]  Randal C. Burns,et al.  Ext3cow: a time-shifting file system for regulatory compliance , 2005, TOS.

[43]  Craig A. N. Soules,et al.  Metadata Efficiency in a Comprehensive Versioning File System (CMU-CS-02-145) , 2002 .

[44]  George Varghese,et al.  Automated Worm Fingerprinting , 2004, OSDI.

[45]  Jim Gray,et al.  A Conversation with Jim Gray , 2003, ACM Queue.

[46]  Hany E. Ramadan Abort, Retry, Litigate: Dependable Systems and Contract Law , 2006, HotDep.

[47]  Sharon E. Perl,et al.  Myriad: Cost-Effective Disaster Tolerance , 2002, FAST.

[48]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.