Optimistic replication

Data replication is a key technology in distributed systems that enables higher availability and performance. This article surveys optimistic replication algorithms. They allow replica contents to diverge in the short term to support concurrent work practices and tolerate failures in low-quality communication links. The importance of such techniques is increasing as collaboration through wide-area and mobile networks becomes popular.Optimistic replication deploys algorithms not seen in traditional “pessimistic” systems. Instead of synchronous replica coordination, an optimistic algorithm propagates changes in the background, discovers conflicts after they happen, and reaches agreement on the final contents incrementally.We explore the solution space for optimistic replication algorithms. This article identifies key challenges facing optimistic replication systems---ordering operations, detecting and resolving conflicts, propagating changes efficiently, and bounding replica divergence---and provides a comprehensive survey of techniques developed for addressing these challenges.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[3]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[4]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[5]  E. Masterson,et al.  Summary , 1981, Vision Research.

[6]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[7]  Michael J. Fischer,et al.  Sacrificing serializability to attain high availability of data in an unreliable network , 1982, PODS.

[8]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[9]  Bruce Walker,et al.  The LOCUS distributed operating system , 1983, SOSP '83.

[10]  Philip A. Bernstein,et al.  The failure and recovery problem for replicated databases , 1983, PODC '83.

[11]  Alley Stoughton,et al.  Detection of Mutual Inconsistency in Distributed Systems , 1983, IEEE Transactions on Software Engineering.

[12]  Arthur J. Bernstein,et al.  Efficient solutions to the replicated log and dictionary problems , 1984, PODC '84.

[13]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[14]  Patrick E. O'Neil,et al.  The Escrow transactional method , 1986, TODS.

[15]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[16]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[17]  Paul V. Mockapetris,et al.  Development of the domain name system , 1988, SIGCOMM '88.

[18]  Semantics Based Transaction Management Techniques for Replicated Data , 1988, SIGMOD Conference.

[19]  Michael Stonebraker,et al.  Semantics based transaction management techniques for replicated data , 1988, SIGMOD '88.

[20]  Irene Greif,et al.  Replicated document management in a group communication system , 1988, CSCW '88.

[21]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[22]  Arthur L. Liestman,et al.  A survey of gossiping and broadcasting in communication networks , 1988, Networks.

[23]  V. Rich Personal communication , 1989, Nature.

[24]  Clarence A. Ellis,et al.  Concurrency control in groupware systems , 1989, SIGMOD '89.

[25]  Shivakant Mishra,et al.  Implementing fault-tolerant replicated objects using Psync , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.

[26]  Liuba Shrira,et al.  Lazy replication: exploiting the semantics of distributed services (extended abstract) , 1990, OPSR.

[27]  Rafael Alonso,et al.  Data caching issues in an information retrieval system , 1990, TODS.

[28]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[29]  Akhil Kumar An analysis of borrowing policies for escrow transactions in a replicated data environment , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[30]  Mike Loukides,et al.  Managing NFS and NIS , 1991 .

[31]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[32]  Bernadette Charron-Bost,et al.  Concerning the Size of Logical Clocks in Distributed Systems , 1991, Inf. Process. Lett..

[33]  Calton Pu,et al.  Replica control in distributed systems: as asynchronous approach , 1991, SIGMOD '91.

[34]  Richard A. Golding,et al.  Weak-consistency group communication and membership , 1992 .

[35]  M. Nussbaum Database Transaction Models for Advanced Applications , 1992 .

[36]  A. Elmagarmid Database transaction models for advanced applications , 1992 .

[37]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[38]  Mahadev Satyanarayanan,et al.  Disconnected operation in the Coda File System , 1992, TOCS.

[39]  Liuba Shrira,et al.  Providing high availability using lazy replication , 1992, TOCS.

[40]  Stephen Deering,et al.  Multicast routing in a datagram internetwork , 1992 .

[41]  Gerald J. Popek,et al.  Consistency algorithms for optimistic replication , 1993, 1993 International Conference on Network Protocols.

[42]  Celine Valot,et al.  Characterizing the accuracy of distributed timestamps , 1993, PADD '93.

[43]  Mahadev Satyanarayanan,et al.  Log-based directory resolution in the Coda file system , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[44]  P. Cederqvist,et al.  Version Management with CVS , 1993 .

[45]  Daniel J. Dietterich DEC data distributor: for data replication and data warehousing , 1994, SIGMOD '94.

[46]  Arthur J. Bernstein,et al.  Bounded ignorance: a technique for increasing concurrency in a replicated system , 1994, TODS.

[47]  David L. Mills,et al.  Improved algorithms for synchronizing computer network clocks , 1995, TNET.

[48]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[49]  Kurt J. Lidl,et al.  Drinking from the Firehose: Multicast USENET News , 1994, USENIX Winter.

[50]  Paul Albitz,et al.  DNS and BIND , 1994 .

[51]  John S. Heidemann,et al.  Resolving File Conflicts in the Ficus File System , 1994, USENIX Summer.

[52]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[53]  Gordon V. Cormack A calculus for concurrent update , 1995 .

[54]  Don Bolinger,et al.  Applying RCS and SCCS , 1995 .

[55]  P. Mockapetris,et al.  Development of the Domain Name System , 1988, CCRV.

[56]  ZHANGLi-xia,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995 .

[57]  Calton Pu,et al.  A Formal Characterization of Epsilon Serializability , 1995, IEEE Trans. Knowl. Data Eng..

[58]  Kenneth Moore The Lotus notes storage system , 1995, SIGMOD '95.

[59]  Qi Lu,et al.  Improving data consistency in mobile computing using isolation-only transactions , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[60]  Maria Ebling,et al.  Exploiting weak connectivity for mobile file access , 1995, SOSP.

[61]  Noha Adly Management of replicated data in large scale systems , 1995 .

[62]  Mahadev Satyanarayanan,et al.  Flexible and Safe Resolution of File Conflicts , 1995, USENIX.

[63]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[64]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[65]  Nancy A. Lynch,et al.  Eventually-serializable data services , 1996, PODC '96.

[66]  Peter B. Danzig,et al.  A Hierarchical Internet Object Cache , 1996, USENIX ATC.

[67]  Theodore Johnson,et al.  Hierarchical Matrix Timestamps for Scalable Update Propagation , 1996 .

[68]  Grace Todino,et al.  Using and managing uucp , 1996 .

[69]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[70]  Danny Z. Chen,et al.  A Consistency Model and Supporting Schemes for Real-time Cooperative Editing Systems , 1996 .

[71]  Narain H. Gehani,et al.  Scalable Update Propagation in Epidemic Replicated Databases , 1996, EDBT.

[72]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[73]  Sanjoy Paul,et al.  Reliable Multicast Transport Protocol (RMTP) , 1997, IEEE J. Sel. Areas Commun..

[74]  Marvin Theimer,et al.  Dealing with server corruption in weakly consistent, replicated data systems , 1997, MobiCom '97.

[75]  Marvin Theimer,et al.  Flexible update propagation for weakly consistent replication , 1997, SOSP.

[76]  Panos K. Chrysanthis,et al.  Executive Briefing: Advances in Concurrency Control and Transaction Processing , 1997 .

[77]  Steven McCanne,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995, SIGCOMM '95.

[78]  Barbara Liskov,et al.  Lazy consistency using loosely synchronized clocks , 1997, PODC '97.

[79]  H. V. Jagadish,et al.  Scalable versioning in distributed databases with commuting updates , 1997, Proceedings 13th International Conference on Data Engineering.

[80]  Divyakant Agrawal,et al.  Epidemic algorithms in replicated databases (extended abstract) , 1997, PODS.

[81]  David Ratner,et al.  Roam: a scalable replication system for mobile and distributed computing , 1998 .

[82]  Julie McKeehan,et al.  Palm Programming: The Developer's Guide , 1998 .

[83]  Benjamin C. Pierce,et al.  What is a file synchronizer? , 1998, MobiCom '98.

[84]  Yanchun Zhang,et al.  Achieving convergence, causality preservation, and intention preservation in real-time cooperative editing systems , 1998, TCHI.

[85]  P. Bernstein RRENCY CONTROL AND RECOVERY IN DATABASE SYSTEMS , 1998 .

[86]  Neil Rhodes,et al.  Palm Programming: The Developer's Guide with CD-ROM , 1998 .

[87]  Chengzheng Sun,et al.  Operational transformation in real-time group editors: issues, algorithms, and achievements , 1998, CSCW '98.

[88]  John B. Carter,et al.  Khazana: an infrastructure for building distributed services , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[89]  Henry Spencer,et al.  Managing Usenet , 1998 .

[90]  Gordon V. Cormack,et al.  Operation transforms for a distributed shared spreadsheet , 1998, CSCW '98.

[91]  Andrew Tridgell,et al.  Efficient Algorithms for Sorting and Synchronization , 1999 .

[92]  Keith Marzullo,et al.  Directional Gossip: Gossip in a Wide Area Network , 1999, EDCC.

[93]  Nancy A. Lynch,et al.  Eventually-Serializable Data Services , 1999, Theor. Comput. Sci..

[94]  Werner Vogels,et al.  File system usage in Windows NT 4.0 , 1999, SOSP.

[95]  Peter J. Keleher,et al.  Decentralized replicated-object protocols , 1999, PODC '99.

[96]  Andrew V. Goldberg,et al.  A prototype implementation of archival Intermemory , 1999, DL '99.

[97]  Mustaque Ahamad,et al.  Plausible clocks: constant size logical clocks for distributed systems , 1996, Distributed Computing.

[98]  Eric A. Brewer,et al.  Harvest, yield, and scalable tolerant systems , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[99]  Kenneth P. Birman,et al.  Bimodal multicast , 1999, TOCS.

[100]  Michael Dahlin,et al.  Hierarchical Cache Consistency in a WAN , 1999, USENIX Symposium on Internet Technologies and Systems.

[101]  David Wetherall,et al.  A protocol-independent technique for eliminating redundant network traffic , 2000, SIGCOMM 2000.

[102]  Henry M. Levy,et al.  Optimistic Replication for Internet Data Services , 2000, DISC.

[103]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[104]  Nicolas Vidot,et al.  Copies convergence in a distributed real-time collaborative environment , 2000, CSCW '00.

[105]  David Wetherall,et al.  A protocol-independent technique for eliminating redundant network traffic , 2000, SIGCOMM.

[106]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[107]  Yin Zhang,et al.  The Stationarity of Internet Path Properties: Routing, Loss, and Throughput , 2000 .

[108]  Paulo Sérgio Almeida,et al.  Panasync: dependency tracking among file copies , 2000, EW 9.

[109]  Qixiang Sun,et al.  Reliable Multicast for Publish/Subscribe Systems , 2000 .

[110]  Nancy A. Lynch,et al.  Specifying and using a partitionable group communication service , 2001, TOCS.

[111]  MaziéresDavid,et al.  A low-bandwidth network file system , 2001 .

[112]  Brian D. Noble,et al.  Fast reconciliations in fluid replication , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[113]  Antony I. T. Rowstron,et al.  The IceCube approach to the reconciliation of divergent replicas , 2001, PODC '01.

[114]  Jon M. Kleinberg,et al.  Spatial gossip and resource location protocols , 2001, JACM.

[115]  Antony I. T. Rowstron,et al.  Optimising Synchronisation Times for Mobile Devices , 2001, NIPS.

[116]  Norman Ramsey,et al.  An algebraic approach to file synchronization , 2001, ESEC/FSE-9.

[117]  R. Gold,et al.  Boosting system performance with optimistic distributed protocols , 2001 .

[118]  David Mazières,et al.  A low-bandwidth network file system , 2001, SOSP.

[119]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[120]  Deborah Estrin,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Fine-grained Network Time Synchronization Using Reference Broadcasts , 2022 .

[121]  Amin Vahdat,et al.  Minimal replication cost for availability , 2002, PODC '02.

[122]  Dennis Shasha,et al.  Building secure file systems out of byzantine storage , 2002, PODC '02.

[123]  Paulo Sérgio Almeida,et al.  Version stamps-decentralized version vectors , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[124]  Fred B. Schneider,et al.  Spreading rumors cheaply, quickly, and reliably , 2002 .

[125]  Kwong-Sak Leung,et al.  Operation Shipping for Mobile File Systems , 2002, IEEE Trans. Computers.

[126]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[127]  Brian D. Noble,et al.  Safety, Visibility, and Performance in a Wide-Area File System , 2002, FAST.

[128]  Divyakant Agrawal,et al.  Epidemic Algorithms for Replicated Databases , 2003, IEEE Trans. Knowl. Data Eng..

[129]  Robert Wilensky,et al.  The hash history approach for reconciling mutual inconsistency , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[130]  Yaron Minsky,et al.  Set reconciliation with nearly optimal communication complexity , 2003, IEEE Trans. Inf. Theory.

[131]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[132]  Marc Shapiro,et al.  Semantics-Based Reconciliation for Collaborative and Mobile Environments , 2003, OTM.

[133]  Hala Skaf-Molli,et al.  Safe Generic Data Synchronizer , 2003 .

[134]  Jennifer Vesperman Essential CVS , 2003 .

[135]  Philip S. Yu,et al.  Divergence control for distributed database systems , 2005, Distributed and Parallel Databases.

[136]  The costs and limits of availability for replicated services , 2001, TOCS.