Lifetime and availability of data stored on a P2P system: Evaluation of redundancy and recovery schemes

This paper studies the performance of Peer-to-Peer storage and backup systems (P2PSS). These systems are based on three pillars: data fragmentation and dissemination among the peers, redundancy mechanisms to cope with peers churn and repair mechanisms to recover lost or temporarily unavailable data. Usually, redundancy is achieved either by using replication or by using erasure codes. A new class of network coding (regenerating codes) has been proposed recently. Therefore, we will adapt our work to these three redundancy schemes. We introduce two mechanisms for recovering lost data and evaluate their performance by modeling them through absorbing Markov chains. Specifically, we evaluate the quality of service provided to users in terms of durability and availability of stored data for each recovery mechanism and deduce the impact of its parameters on the system performance. The first mechanism is centralized and based on the use of a single server that can recover multiple losses at once. The second mechanism is distributed: reconstruction of lost fragments is iterated sequentially on many peers until that the required level of redundancy is attained. The key assumptions made in this work, in particular, the assumptions made on the recovery process and peer on-times distribution, are in agreement with the analysis in [1] and in [2] respectively. The models are thereby general enough to be applicable to many distributed environments as shown through numerical computations. We find that, in stable environments such as local area or research institute networks where machines are usually highly available, the distributed-repair scheme in erasure-coded systems offers a reliable, scalable and cheap storage/backup solution. For the case of highly dynamic environments, in general, the distributed-repair scheme is inefficient, in particular to maintain high data availability, unless the data redundancy is high. Using regenerating codes overcomes this limitation of the distributed-repair scheme. P2PSS with centralized-repair scheme are efficient in any environment but have the disadvantage of relying on a centralized authority. However, the analysis of the overhead cost (e.g. computation, bandwidth and complexity cost) resulting from the different redundancy schemes with respect to their advantages (e.g. simplicity), is left for future work.

[1]  Abdulhalim Dandoush,et al.  Simulation analysis of download and recovery processes in P2P storage systems , 2009, 2009 21st International Teletraffic Congress.

[2]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[3]  K. Mani Chandy,et al.  Open, Closed, and Mixed Networks of Queues with Different Classes of Customers , 1975, JACM.

[4]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[5]  Peter G. Harrison,et al.  Queueing models of RAID systems with maxima of waiting times , 2007, Perform. Evaluation.

[6]  Stefan Savage,et al.  Understanding Availability , 2003, IPTPS.

[7]  Marcel F. Neuts,et al.  Matrix-Geometric Solutions in Stochastic Models , 1981 .

[8]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[9]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[10]  Abdulhalim Dandoush,et al.  Performance Analysis of Centralized versus Distributed Recovery Schemes in P2P Storage Systems , 2009, Networking.

[11]  Joseph Pasquale,et al.  Analysis of Long-Running Replicated Systems , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[12]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[13]  Ravi Jain,et al.  An Experimental Study of the Skype Peer-to-Peer VoIP System , 2005, IPTPS.

[14]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[15]  Marcel F. Neuts,et al.  Matrix-geometric solutions in stochastic models - an algorithmic approach , 1982 .

[16]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[17]  Abdulhalim Dandoush,et al.  A realistic simulation model for peer-to-peer storage systems , 2009, VALUETOOLS.

[18]  Abdulhalim Dandoush,et al.  Performance Analysis of Peer-to-Peer Storage Systems , 2007, ITC.

[19]  Abdulhalim Dandoush,et al.  Flow-Level Modeling of Parallel Download in Distributed Systems , 2010, 2010 Third International Conference on Communication Theory, Reliability, and Quality of Service.

[20]  Yafei Dai,et al.  Exploring the Cost-Availability Tradeoff in P2P Storage Systems , 2009, 2009 International Conference on Parallel Processing.

[21]  Stéphane Pérennes,et al.  Analysis of failure correlation impact on peer-to-peer storage systems , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[22]  David Moore,et al.  Replication Strategies for Highly Available Peer-to-Peer Storage , 2002, Future Directions in Distributed Computing.

[23]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[24]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[25]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[26]  Anne-Marie Kermarrec,et al.  Availability-Based Methods for Distributed Storage Systems , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[27]  Minghua Chen,et al.  Queuing models for peer-to-peer systems , 2009, IPTPS.

[28]  Abdulhalim Dandoush,et al.  Lifetime and availability of data stored on a P2P system: Evaluation of recovery schemes , 2010 .

[29]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[30]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.

[31]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[32]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[33]  Michele Amoretti,et al.  Sporadic decentralized resource maintenance for P2P distributed storage networks , 2014, J. Parallel Distributed Comput..

[34]  Denis Caromel ProActive Parallel Suite: Multi-cores to Clouds to autonomicity , 2009 .

[35]  Vinod M. Prabhakaran,et al.  Decentralized erasure codes for distributed networked storage , 2006, IEEE Transactions on Information Theory.

[36]  Peter Druschel,et al.  Storage management and caching in PAST , 2001 .