On the interplay between data redundancy and retrieval times in P2P storage systems

Peer-to-peer (P2P) storage systems aggregate spare storage resources from end users to build a large collaborative online storage solution. In these systems, however, the high levels of user churn-peers failing or leaving temporarily or permanently-affect the quality of the storage service and might put data reliability on risk. Indeed, one of the main challenge of P2P storage systems has traditionally been how to guarantee that stored data can always be retrieved within some time frame. To meet this challenge, existing systems store objects with high amounts of data redundancy, rendering data availability values close to 100%, which in turn ensure optimal retrieval times (only constrained by network limits). Unfortunately, this redundancy reduces the overall net capacity of the system and increases data maintenance costs. To alleviate these problems data redundancy can be reduced at the expense of lengthening retrieval times. The problem is that both the rewards and disadvantages of doing so are not well understood. In this paper we present a novel analytical framework that allows us to model retrieval times in P2P storage systems and describe the interplay between data redundancy and retrieval times for different churn patterns. Using availability traces from real P2P applications, we show that our framework provides accurate estimation of retrieval times in realistic environments.

[1]  Marc Sánchez Artigas,et al.  Availability and Redundancy in Harmony: Measuring Retrieval Times in P2P Storage Systems , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[2]  Brian D. Noble,et al.  Samsara: honor among thieves in peer-to-peer storage , 2003, SOSP '03.

[3]  Ravi Jain,et al.  An Experimental Study of the Skype Peer-to-Peer VoIP System , 2005, IPTPS.

[4]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[5]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[6]  Daniel Stutzbach,et al.  Characterizing files in the modern Gnutella network , 2007, Multimedia Systems.

[7]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[8]  Ali S. Hadi,et al.  Extreme Value and Related Models with Applications in Engineering and Science , 2004 .

[9]  Pietro Michiardi,et al.  Redundancy management for P2P backup , 2012, 2012 Proceedings IEEE INFOCOM.

[10]  Pietro Michiardi,et al.  Online Data Backup: A Peer-Assisted Approach , 2010, 2010 IEEE Tenth International Conference on Peer-to-Peer Computing (P2P).

[11]  Frédérique Oggier,et al.  Self-repairing homomorphic codes for distributed storage systems , 2010, 2011 Proceedings IEEE INFOCOM.

[12]  Refik Molva,et al.  Safebook: A privacy-preserving online social network leveraging on real-life trust , 2009, IEEE Communications Magazine.

[13]  Ira Pramanick,et al.  High Availability , 2001, Int. J. High Perform. Comput. Appl..

[14]  Daniel Stutzbach,et al.  Understanding churn in peer-to-peer networks , 2006, IMC '06.

[15]  Konstantinos Psounis,et al.  Performance analysis of BitTorrent-like systems with heterogeneous users , 2007, Performance evaluation (Print).

[16]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[17]  Raúl Gracia Tinedo,et al.  FriendBox: A Hybrid F2F Personal Storage Application , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[18]  Biplab Sikdar,et al.  A Queuing Model for Evaluating the Transfer Latency of Peer-to-Peer Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.

[19]  D. M. Chiu,et al.  Erasure code replication revisited , 2004, Proceedings. Fourth International Conference on Peer-to-Peer Computing, 2004. Proceedings..

[20]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[21]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[22]  Antony I. T. Rowstron,et al.  PAST: a large-scale, persistent peer-to-peer storage utility , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[23]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[24]  Brian D. Noble,et al.  Exploiting Availability Prediction in Distributed Systems , 2006, NSDI.

[25]  Marco Gramaglia,et al.  Off-line incentive mechanism for long-term P2P backup storage , 2012, Comput. Commun..

[26]  Taoufik En-Najjary,et al.  Proactive replication in distributed storage systems using machine availability estimation , 2007, CoNEXT '07.

[27]  Sonja Buchegger,et al.  PeerSoN: P2P social networking: early experiences and insights , 2009, SNS '09.

[28]  Dmitri Loguinov,et al.  Modeling Heterogeneous User Churn and Local Resilience of Unstructured P2P Networks , 2006, Proceedings of the 2006 IEEE International Conference on Network Protocols.

[29]  Anne-Marie Kermarrec,et al.  Availability-Based Methods for Distributed Storage Systems , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[30]  Matteo Sereno,et al.  Analysis of resource transfers in peer-to-peer file sharing applications using fluid models , 2006, Perform. Evaluation.

[31]  Pietro Michiardi,et al.  Data transfer scheduling for P2P storage , 2011, 2011 IEEE International Conference on Peer-to-Peer Computing.

[32]  E. Castillo Extreme value and related models with applications in engineering and science , 2005 .

[33]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[34]  Marc Sánchez Artigas,et al.  Enforcing fairness in P2P storage systems using asymmetric reciprocal exchanges , 2011, 2011 IEEE International Conference on Peer-to-Peer Computing.

[35]  John Kubiatowicz,et al.  Design and evaluation of distributed wide-area on-line archival storage systems , 2006 .

[36]  Dmitri Loguinov,et al.  Residual-Based Measurement of Peer and Link Lifetimes in Gnutella Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[37]  Raúl Gracia Tinedo,et al.  F2Box: Cloudifying F2F Storage Systems with High Availability Correlation , 2012, IEEE CLOUD.

[38]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[39]  Daniel Stutzbach,et al.  Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems , 2005, IEEE/ACM Transactions on Networking.

[40]  Taoufik En-Najjary,et al.  A global view of kad , 2007, IMC '07.