A Migratory Approach to Dynamic Replication in Large-Scale Distributed Systems

Distributed replication forms a significant component in many distributed systems. We consider migratory solutions for replica location in a large-scale distributed system. Replica location strategies decide dynamically how many times an object is replicated and where it is placed, in order to ensure availability, attacker-resistance, and scalability. Most traditional replica location schemes are static and reactive to failures. This paper presents dynamic and migratory schemes for replica location. More specifically, we present a new class of probabilistic protocols called endemic protocols. By using analytical techniques borrowed from the study of non-linear systems, and through large-scale simulations, we show how an endemic protocol can be used in building a decentralized and persistent file storage service. The protocols continuously migrate a small number of replicas of each object through the host population. This means an attacker will not be able to predict the exact number and locations of all replicas of a object. Contrary to intuition, endemic protocols can provide good performance by generating only a constant amount of network traffic at each host. Endemic protocols are resistant to massive failures and host churn. Existence of even one residual replica of an object in the system causes the system to regenerate more replicas. Analytically, endemics have the potential to preserve an object for several human generations, much like the persistent survival of folklores and endemic diseases such as common cold. The protocols are also very simple to implement.

[1]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[2]  Amin Vahdat,et al.  The costs and limits of availability for replicated services , 2001, TOCS.

[3]  Indranil Gupta,et al.  On scalable and efficient distributed failure detectors , 2001, PODC '01.

[4]  Abhinandan Das,et al.  SWIM: scalable weakly-consistent infection-style process group membership protocol , 2002, Proceedings International Conference on Dependable Systems and Networks.

[5]  Magnus Karlsson,et al.  Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.

[6]  Roger M. Needham,et al.  Experience with Grapevine: the growth of a distributed system , 1984, TOCS.

[7]  Peter L. Reiher,et al.  Peer Replication with Selective Control , 1999, MDA.

[8]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[9]  George Coulouris,et al.  Distributed systems - concepts and design , 1988 .

[10]  Peter Druschel,et al.  Storage management and caching in PAST , 2001 .

[11]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[12]  Doug Terry,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[13]  Scott A. Smolka,et al.  Composition and Behaviors of Probabilistic I/O Automata , 1994, Theor. Comput. Sci..

[14]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.