Estimating Properties in Dynamic Systems: The Case of Churn in P2P Networks

In many systems, such as P2P systems, the dynamicity of participating elements, or churn, has a strong impact. As a consequence, many efforts have been made to characterize it, and in particular to capture the session length distribution. However in most cases, estimating it rigorously is difficult. One of the reasons is that, because the observation window is by definition finite, parts of the sessions that begin before the window and/or end after it are missed. This induces a bias. Although it tends to decrease when the observation window length increases, it is difficult to quantify its importance, or how fast it decreases. Here, we introduce a general methodology that allows us to know if the observation window is long enough to characterize a given property. This methodology is not specific to one study case and may be applied to any property in a dynamic system. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We show that the measurement needs to last for at least one week in order to obtain representative results. We also show that our methodology allows us to precisely characterize the shape of the session length distribution.

[1]  Walter Willinger,et al.  A pragmatic approach to dealing with high-variability in network measurements , 2004, IMC '04.

[2]  Taoufik En-Najjary,et al.  A global view of kad , 2007, IMC '07.

[3]  D. Brillinger,et al.  Handbook of methods of applied statistics , 1967 .

[4]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[5]  Songqing Chen,et al.  Analyzing patterns of user content generation in online social networks , 2009, KDD.

[6]  Matthieu Latapy,et al.  Complex Network Measurements: Estimating the Relevance of Observed Properties , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[7]  Matthieu Latapy,et al.  Fast Dynamics in Internet Topology: Observations and First Explanations , 2009, 2009 Fourth International Conference on Internet Monitoring and Protection.

[8]  Stefan Savage,et al.  Understanding Availability , 2003, IPTPS.

[9]  Daniel Stutzbach,et al.  Understanding churn in peer-to-peer networks , 2006, IMC '06.

[10]  I. Chakravarti,et al.  Handbook of Methods of Applied Statistics:@@@Volume I: Techniques of Computation, Descriptive Methods, and Statistical Inference@@@Volume II: Planning of Surveys and Experiments. , 1968 .

[11]  Lixia Zhang,et al.  Observing the evolution of internet as topology , 2007, SIGCOMM 2007.

[12]  Peng Xie,et al.  Sampling biases in IP topology measurements , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[13]  A Peer Activity Study in eDonkey & Kad , 2009 .

[14]  Matteo Sereno,et al.  A measurement study supporting P2P file-sharing community models , 2009, Comput. Networks.

[15]  Dmitri Loguinov,et al.  Residual-Based Measurement of Peer and Link Lifetimes in Gnutella Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[16]  Johan Karlsson,et al.  Metrics for Power Spectra: An Axiomatic Approach , 2009, IEEE Transactions on Signal Processing.

[17]  Walter Willinger,et al.  Hot today, gone tomorrow: on the migration of MySpace users , 2009, WOSN '09.

[18]  Krishna P. Gummadi,et al.  Measuring and analyzing the characteristics of Napster and Gnutella hosts , 2003, Multimedia Systems.

[19]  Matthieu Latapy,et al.  Ten weeks in the life of an eDonkey server , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.