Data durability in peer to peer storage systems

In this paper we present a quantitative study of data survival in peer to peer storage systems. We first recall two main redundancy mechanisms: replication and erasure codes, which are used by most peer to peer storage systems like OceanStore, PAST or CFS, to guarantee data durability. Second we characterize peer to peer systems according to a volatility factor (a peer is free to leave the system at anytime) and to an availability factor (a peer is not permanently connected to the system). Third we model the behavior of a system as a Markov chain and analyse the average life time of data (MTTF) according to the volatility and availability factors. We also present the cost of the repair process based on these redundancy schemes to recover failed peers. The conclusion of this study is that when there is no high availability of peers, a simple replication scheme may be more efficient than sophisticated erasure codes.

[1]  Andrew V. Goldberg,et al.  A prototype implementation of archival Intermemory , 1999, DL '99.

[2]  Matei Ripeanu,et al.  Peer-to-peer architecture case study: Gnutella network , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[3]  Andrew V. Goldberg,et al.  Towards an archival Intermemory , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[4]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[5]  Antony I. T. Rowstron,et al.  PAST: a large-scale, persistent peer-to-peer storage utility , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[6]  Ben Y. Zhao,et al.  OceanStore: An Extremely Wide-Area Storage System , 2002, ASPLOS 2002.

[7]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[8]  Ian Clarke,et al.  A Distributed Decentralised Information Storage and Retrieval System , 1999 .

[9]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[10]  S PlankJames,et al.  A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[11]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[12]  ChrisWells ComputerScienceDivision Universityof The OceanStore Archive : Goals , Structures , and Self-Repair , 2001 .

[13]  Leonard Kleinrock,et al.  Queueing Systems: Volume I-Theory , 1975 .