TPDS Information for authors

Distributed peer-to-peer systems rely on voluntary participation of peers to effectively manage a storage pool. In such systems, data is generally replicated for performance and availability. If the storage associated with replication is not monitored and provisioned, the underlying benefits may not be realized. Resource constraints, performance scalability, and availability present diverse considerations. Availability and performance scalability, in terms of response time, are improved by aggressive replication, whereas resource constraints limit total storage in the network. Identification and elimination of redundant data pose fundamental problems for such systems. In this paper, we present a novel and efficient solution that addresses availability and scalability with respect to management of redundant data. Specifically, we address the problem of duplicate elimination in the context of systems connected over an unstructured peer-to-peer network in which there is no a priori binding between an object and its location. We propose two randomized protocols to solve this problem in a scalable and decentralized fashion that does not compromise the availability requirements of the application. Performance results using both large-scale simulations and a prototype built on PlanetLab demonstrate that our protocols provide high probabilistic guarantees while incurring minimal administrative overheads

[1]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[2]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[3]  Magnus Karlsson,et al.  Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.

[4]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[5]  Vwani P. Roychowdhury,et al.  Percolation search in power law networks: making unstructured peer-to-peer networks scalable , 2004 .

[6]  Rajeev Motwani,et al.  Estimating Aggregates on a Peer-to-Peer Network , 2003 .

[7]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[8]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[9]  Fred Douglis,et al.  Redundancy Elimination Within Large Collections of Files , 2004, USENIX Annual Technical Conference, General Track.

[10]  Brian D. Noble,et al.  Samsara: honor among thieves in peer-to-peer storage , 2003, SOSP '03.

[11]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[12]  Andrew V. Goldberg,et al.  Towards an archival Intermemory , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[13]  Michael Burrows,et al.  A Cooperative Internet Backup Scheme , 2003, USENIX Annual Technical Conference, General Track.

[14]  Ian Pratt,et al.  Proceedings of the General Track: 2004 USENIX Annual Technical Conference , 2004 .

[15]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[16]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[17]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[18]  Suresh Jagannathan,et al.  Search with probabilistic guarantees in unstructured peer-to-peer networks , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[19]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[20]  Dahlia Malkhi,et al.  Estimating network size from local information , 2003, Information Processing Letters.

[21]  Michael K. Reiter,et al.  Probabilistic quorum systems , 1997, PODC '97.

[22]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[23]  Christos Gkantsidis,et al.  Random walks in peer-to-peer networks , 2004, IEEE INFOCOM 2004.

[24]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[25]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .