Practical Wide-Area Database Replication 1

This paper explores the architecture, implementation and performance of a wide and local area database replication system. The architecture provides peer replication, supporting diverse application semantics, based on a group communication paradigm. Network partitions and merges, computer crashes and recoveries, and message omissions are all handled. Using a generic replication engine and the Spread group communication toolkit, we provide replication services for the PostgreSQL database system. We define three different environments to be used as test-beds: a local area cluster, a wide area network that spans the U.S.A, and an emulated wide area test bed. We conduct an extensive set of experiments on these environments, varying the number of replicas and clients, the mix of updates and queries, and the network latency. Our results show that sophisticated algorithms and careful distributed systems design can make symmetric, synchronous, peer database replication a reality for both local and wide area networks.

[1]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[2]  Idit Keidar,et al.  A client-server oriented algorithm for virtually synchronous group membership in WANs , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[3]  Yair Amir,et al.  Replication using group communication over a partitioned network (שכפול באמצעות תקשרת קבוצות מעל רשת דינמית.) , 1995 .

[4]  A. Montresor System Support for Programming Object-Oriented Dependable Applications in Partitionable Systems (Ph.D. Thesis) , 2000 .

[5]  Yair Amir,et al.  A low latency, loss tolerant architecture and protocol for wide area group communication , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[6]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[7]  Gustavo Alonso,et al.  Scalable Replication in Database Clusters , 2000, DISC.

[8]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[9]  Yair Amir,et al.  Evaluating quorum systems over the Internet , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[10]  Idit Keidar,et al.  A client-server approach to virtually synchronous group multicast: specifications and algorithms , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[11]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[12]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[13]  Darrell D. E. Long,et al.  Efficient dynamic voting algorithms , 1988, Proceedings. Fourth International Conference on Data Engineering.

[14]  Sushil Jajodia,et al.  Dynamic voting algorithms for maintaining the consistency of a replicated database , 1990, TODS.

[15]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[16]  Divyakant Agrawal,et al.  Database Replication Using Epidemic Update , 2000 .

[17]  Baruch Awerbuch,et al.  Flow Control for Many-to-Many Multicast: A Cost-Benefit Approach , 2001 .

[18]  Louise E. Moser,et al.  The Totem multiple-ring ordering and topology maintenance protocol , 1998, TOCS.

[19]  Yair Amir,et al.  From total order to database replication , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[20]  Gustavo Alonso,et al.  Improving the scalability of fault-tolerant database clusters , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[21]  Yair Amir,et al.  Seamlessly Selecting the Best Copy from Internet-Wide Replicated Web Servers , 1998, DISC.

[22]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[23]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.