Simulation analysis of download and recovery processes in P2P storage systems

Peer-to-peer storage systems rely on data fragmentation and distributed storage. Unreachable fragments are continuously recovered, requiring multiple fragments of data (constituting a “block”) to be downloaded in parallel. Recent modeling efforts have assumed the recovery process to follow an exponential distribution, an assumption made mainly in the absence of studies characterizing the “real” distribution of the recovery process. This work aims at filling this gap through a simulation study. To that end, we implement the distributed storage protocol in the NS-2 network simulator and run a total of seven experiments covering a large variety of scenarios. We show that the fragment download time follows approximately an exponential distribution. We also show that the block download time and the recovery time essentially follow a hypo-exponential distribution with many distinct phases (maximum of as many exponentials). We use expectation maximization and least square estimation algorithms to fit the empirical distributions. We also provide a good approximation of the number of phases of the hypo-exponential distribution that applies in all scenarios considered. Last, we test the goodness of our fits using statistical (Kolmogorov-Smirnov test) and graphical methods.

[1]  Tobias Hoßfeld,et al.  Efficient simulation of large-scale p2p networks: packet-level vs. flow-level simulations , 2007, UPGRADE '07.

[2]  Ibrahim Matta,et al.  BRITE: Boston University Representative Internet Topology gEnerator: A Flexible Generator of Internet Topologies , 2000 .

[3]  Do Young Eun,et al.  Minimizing file download time in stochastic peer-to-peer networks , 2008, TNET.

[4]  Thomas E. Anderson,et al.  Leveraging BitTorrent for End Host Measurements , 2007, PAM.

[5]  Michael Mitzenmacher,et al.  Accessing multiple mirror sites in parallel: using Tornado codes to speed up downloads , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[8]  Abdulhalim Dandoush,et al.  Performance Analysis of Centralized versus Distributed Recovery Schemes in P2P Storage Systems , 2009, Networking.

[9]  Abdulhalim Dandoush,et al.  Performance Analysis of Peer-to-Peer Storage Systems , 2007, ITC.

[10]  Kenneth L. Calvert,et al.  Modeling Internet topology , 1997, IEEE Commun. Mag..

[11]  Peter G. Harrison,et al.  Queueing models of RAID systems with maxima of waiting times , 2007, Perform. Evaluation.

[12]  BERNARD M. WAXMAN,et al.  Routing of multipoint connections , 1988, IEEE J. Sel. Areas Commun..

[13]  Joseph Pasquale,et al.  Analysis of Long-Running Replicated Systems , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  Ravi Jain,et al.  An Experimental Study of the Skype Peer-to-Peer VoIP System , 2005, IPTPS.

[15]  Anja Feldmann,et al.  Reflecting P2P User Behaviour Models in a Simulation Environment , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[16]  Qi He,et al.  Mapping peer behavior to packet-level details: a framework for packet-level simulation of peer-to-peer systems , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..