Data Persistence in P2P Backup Systems

Peer-to-peer (P2P) networks have been shown to be a natural and efficient paradigm for modeling most internet applications. However, data persistence remains still a major challenge, particularly in highly dynamic and unstable P2P networks. Within the framework of collaborative engineering, we propose a probabilistic approach based on peers collaboration to guarantee persistence of critical data in a system. Markov chains are used to model applications and realistically capture the behavior of practical systems. The model is first analytically investigated, and then data persistence is measured using an erasure coding redundancy scheme. The mathematical analysis allows us to determine the extent of data persistence in several important cases, and to anticipate the robustness of the large scale dynamic distributed applications.

[1]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[2]  Guihai Chen,et al.  Insight into redundancy schemes in DHTs , 2007, The Journal of Supercomputing.

[3]  Anne-Marie Kermarrec,et al.  Core Persistence in Peer-to-Peer Systems: Relating Size to Lifetime , 2006, OTM Workshops.

[4]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[5]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[6]  Robert Tappan Morris,et al.  Designing a DHT for Low Latency and High Throughput , 2004, NSDI.

[7]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[8]  Dah Ming Chiu,et al.  Erasure code replication revisited , 2004 .

[9]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[10]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[11]  Guihai Chen,et al.  Data Persistence in Structured P2P Networks with Redundancy Schemes , 2007, Sixth International Conference on Grid and Cooperative Computing (GCC 2007).

[12]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[13]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[14]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[15]  Kian-Lee Tan,et al.  PeerStore: better performance by relaxing in peer-to-peer backup , 2004 .

[16]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[17]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[18]  Christopher Batten,et al.  pStore: A Secure Peer-to-Peer Backup System∗ , 2007 .

[19]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.