Uniform Sampling for Directed P2P Networks

Selecting a random peer with uniform probability across a peer-to-peer (P2P) network is a fundamental function for unstructured search, data replication, and monitoring algorithms. Such uniform sampling is supported by several techniques. However, current techniques suffer from sample bias and limited applicability. In this paper, we present a sampling algorithm that achieves a desired uniformity while making essentially no assumptions about the underlying P2P network. This algorithm, called doubly stochastic converge (DSC), iteratively adjusts the probabilities of crossing each link in the network during a random walk, such that the resulting transition matrix is doubly stochastic. DSC is fully decentralized and is designed to work on both directed and undirected topologies, making it suitable for virtually any P2P network. Our simulations show that DSC converges quickly on a wide variety of topologies, and that the random walks needed for sampling are short for most topologies. In simulation studies with FreePastry, we show that DSC is resilient to high levels of churn, while incurring a minimal sample bias.

[1]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[2]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[3]  J. Laurie Snell,et al.  Topics in Contemporary Probability and Its Applications , 1995 .

[4]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[5]  Christos Gkantsidis,et al.  Random walks in peer-to-peer networks , 2004, IEEE INFOCOM 2004.

[6]  Anne-Marie Kermarrec,et al.  Gossip-based peer sampling , 2007, TOCS.

[7]  Márk Jelasity,et al.  T-Man: Gossip-Based Overlay Topology Management , 2005, Engineering Self-Organising Systems.

[8]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[9]  Ming Zhong,et al.  The Convergence-Guaranteed Random Walk and Its Applications in Peer-to-Peer Networks , 2008, IEEE Transactions on Computers.

[10]  Jared Saia,et al.  Choosing a random peer , 2004, PODC '04.

[11]  Hillol Kargupta,et al.  Uniform Data Sampling from a Peer-to-Peer Network , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[12]  Marie-Pierre Gleizes,et al.  Engineering Self-organising Systems , 2011, Self-organising Software.

[13]  Thomas E. Anderson,et al.  Profiling a million user dht , 2007, IMC '07.

[14]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[15]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[16]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[17]  Daniel Stutzbach,et al.  Understanding churn in peer-to-peer networks , 2006, IMC '06.

[18]  Alistair Sinclair,et al.  Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow , 1992, Combinatorics, Probability and Computing.

[19]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[20]  Doubly Stochastic Converge : Uniform Sampling for Directed P 2 P Networks , 2008 .

[21]  Suresh Jagannathan,et al.  Distributed Uniform Sampling in Unstructured Peer-to-Peer Networks , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[22]  Walter Willinger,et al.  On unbiased sampling for unstructured peer-to-peer networks , 2009, TNET.

[23]  Stefan Savage,et al.  Understanding Availability , 2003, IPTPS.

[24]  Steve Chien,et al.  Approximating Aggregate Queries about Web Pages via Random Walks , 2000, VLDB.

[25]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002 .