An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems

In peer-to-peer file sharing systems, file replication technology is widely used to reduce hot spots and improve file query efficiency. Most current file replication methods replicate files in all nodes or two end points on a client-server query path. However, these methods either have low effectiveness or come at a cost of high overhead. File replication in server side enhances replica hit rate, hence, lookup efficiency but produces overloaded nodes and cannot significantly reduce query path length. File replication in client side could greatly reduce query path length, but cannot guarantee high replica hit rate to fully utilize replicas. Though replication along query path solves these problems, it comes at a high cost of overhead due to more replicas and produces underutilized replicas. This paper presents an Efficient and Adaptive Decentralized (EAD) file replication algorithm that achieves high query efficiency and high replica utilization at a significantly low cost. EAD enhances the utilization of file replicas by selecting query traffic hubs and frequent requesters as replica nodes, and dynamically adapting to nonuniform and time-varying file popularity and node interest. Unlike current methods, EAD creates and deletes replicas in a decentralized self-adaptive manner while guarantees high replica utilization. Theoretical analysis shows the high performance of EAD. Simulation results demonstrate the efficiency and effectiveness of EAD in comparison with other approaches in both static and dynamic environments. It dramatically reduces the overhead of file replication, and yields significant improvements on the efficiency and effectiveness of file replication in terms of query efficiency, replica hit rate, and overloaded nodes reduction.

[1]  Andreas Wombacher,et al.  DHT-Based Self-adapting Replication Protocol for Achieving High Data Availability , 2006, SITIS.

[2]  Rajmohan Rajaraman,et al.  Accessing Nearby Copies of Replicated Objects in a Distributed Environment , 1997, SPAA '97.

[3]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[4]  Brighten Godfrey,et al.  Heterogeneity and load balance in distributed hash tables , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[5]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[6]  Cheng-Zhong Xu,et al.  Elastic Routing Table with Provable Performance for Congestion Control in DHT Networks , 2010, IEEE Trans. Parallel Distributed Syst..

[7]  Haiying Shen,et al.  An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[8]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[9]  Marvin Theimer,et al.  Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[10]  Mary Baker,et al.  CUP: Controlled Update Propagation in Peer-to-Peer Networks , 2003, USENIX Annual Technical Conference, General Track.

[11]  Roger Wattenhofer,et al.  Optimizing file availability in a secure serverless distributed file system , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[12]  Mary Baker,et al.  Peer-to-Peer Caching Schemes to Address Flash Crowds , 2002, IPTPS.

[13]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[14]  Guohong Cao,et al.  DUP: Dynamic-Tree Based Update Propagation in Peer-to-Peer Networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Vijay Gopalakrishnan,et al.  Adaptive replication in peer-to-peer systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[16]  Robert Tappan Morris,et al.  Serving DNS Using a Peer-to-Peer Lookup Service , 2002, IPTPS.

[17]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[18]  Scott Shenker,et al.  Can Heterogeneity Make Gnutella Scalable? , 2002, IPTPS.

[19]  Jussi Kangasharju,et al.  Adaptive content management in structured P2P communities , 2006, InfoScale '06.

[20]  Dah Ming Chiu,et al.  Erasure code replication revisited , 2004 .

[21]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[22]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002 .

[23]  Theoni Pitoura,et al.  Replication, Load Balancing and Efficient Range Query Processing in DHTs , 2006, EDBT.

[24]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[25]  Leonard Kleinrock,et al.  Analysis of search and replication in unstructured peer-to-peer networks , 2005, SIGMETRICS '05.

[26]  Leonard Kleinrock,et al.  Proportional Replication in Peer-to-Peer Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[27]  Leonard Kleinrock,et al.  On Fairness, Optimal Download Performance and Proportional Replication in Peer-to-Peer Networks , 2005, NETWORKING.

[28]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[29]  Michael B. Jones,et al.  Overlook: scalable name service on an overlay network , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[30]  Tai-Yi Huang,et al.  LessLog: a logless file replication algorithm for peer-to-peer distributed systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[31]  Jian Ni,et al.  Designing File Replication Schemes for Peer-to-Peer File Sharing Systems , 2008, 2008 IEEE International Conference on Communications.

[32]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[33]  Dan Rubenstein,et al.  Can unstructured P2P protocols survive flash crowds? , 2005, IEEE/ACM Transactions on Networking.

[34]  Seif Haridi,et al.  Symmetric Replication for Structured Peer-to-Peer Systems , 2005, DBISP2P.

[35]  Laurent Massoulié,et al.  Coupon replication systems , 2005, IEEE/ACM Transactions on Networking.

[36]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[37]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[38]  Rajmohan Rajaraman,et al.  Accessing Nearby Copies of Replicated Objects in a Distributed Environment , 1999, Theory of Computing Systems.

[39]  Jussi Kangasharju,et al.  Optimizing File Availability in Peer-to-Peer Content Distribution , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[40]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.