Analysis of Data Reliability Tradeoffs in Hybrid Distributed Storage Systems

This paper surveys previous distributed storage systems and related data redundancy and fault-tolerance schemes which are introduced to overcome the impact of host churn on data reliability. Furthermore, a hybrid storage system model is proposed which offers a reliable data storage service by integrating idle storage contributed by volatile peer nodes and stable and durable storage utilities. In order to ensure high availability and durability for this hybrid storage system, we explore four reliability improvement strategies, including File Replica Strategy, File Encoding Strategy, Replica Repair Strategy, and Stable-Volatile Strategy, as well as the combination of these four strategies. Extensive simulations based on real traces are performed, in which data availability, data durability, and storage overhead are evaluated. Simulation results show that compared with previous peer-to-peer storage systems, the proposed hybrid storage system could achieve a higher availability and durability with less storage consumption, due to proposed new strategies. Finally, taking into account storage and traffic cost, the tradeoffs between storage efficiency and reliability are discussed.

[1]  Stephen L. Scott,et al.  FreeLoader: Scavenging Desktop Storage Resources for Scientific Data , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Matei Ripeanu,et al.  ThriftStore: Finessing Reliability Trade-Offs in Replicated Storage Systems , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[5]  Andreas Haeberlen,et al.  Proactive Replication for Data Durability , 2006, IPTPS.

[6]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[7]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[8]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[9]  Ben Y. Zhao,et al.  AmazingStore: available, low-cost online storage service using cloudlets , 2010, IPTPS.

[10]  Joseph Pasquale,et al.  Analysis of durability in replicated distributed storage systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[12]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[13]  Gilles Fedak,et al.  BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction , 2009, J. Netw. Comput. Appl..

[14]  Gabriel Antoniu,et al.  BlobSeer: Next-generation data management for large scale infrastructures , 2011, J. Parallel Distributed Comput..

[15]  Patrick Butler,et al.  On utilization of contributory storage in desktop grids , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[16]  Karl Aberer,et al.  Internet-Scale Storage Systems under Churn -- A Study of the Steady-State using Markov Models , 2006, Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06).

[17]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[18]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[19]  Robert L. Grossman,et al.  Sector: A high performance wide area community data storage and sharing system , 2010, Future Gener. Comput. Syst..

[20]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.