P-Store: Genuine Partial Replication in Wide Area Networks

Partial replication is a way to increase the scalability of replicated systems: updates only need to be applied to a subset of the system's sites, thus allowing replicas to handle independent parts of the workload in parallel. In this paper, we propose P-Store, a partially replicated key-value store for wide area networks. In P-Store, each transaction T optimistically executes on one or more sites and is then certified to guarantee serializability of the execution. The certification protocol is genuine, it only involves sites that replicate data items read or written by T, and incorporates a mechanism to minimize a convoy effect. P-Store makes a thrifty use of an atomic multicast service to guarantee correctness: no messages need to be multicast during T's execution and a single message is multicast to certify T. In case T is global, that is, T's execution is distributed at different geographical locations, an extra vote phase is required. Our approach may offer better scalability than previously proposed solutions that either require multiple atomic multicast messages to execute T or are non-genuine. Experimental evaluations reveal that the convoy effect plays an important role even when one percent of the transactions are global. We also compare the scalability of our approach to a fully replicated solution when the proportion of global transactions and the number of sites vary.

[1]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[2]  Fernando Pedone,et al.  Genuine versus Non-Genuine Atomic Multicast Protocols for Wide Area Networks: An Empirical Study , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.

[3]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.

[4]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[5]  Fernando Pedone,et al.  Optimistic Algorithms for Partial Database Replication , 2006, OPODIS.

[6]  Ricardo Jiménez-Peris,et al.  Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation , 2007, 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007).

[7]  Achour Mostéfaoui,et al.  Fault-tolerant Total Order Multicast to asynchronous groups , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[8]  Patrick Valduriez,et al.  Consistency management for partial replication in a high performance database cluster , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[9]  Fernando Pedone,et al.  Pronto: High availability for standard off-the-shelf databases , 2008, J. Parallel Distributed Comput..

[10]  Rachid Guerraoui,et al.  Genuine atomic multicast in asynchronous distributed systems , 2001, Theor. Comput. Sci..

[11]  Francesc D. Muñoz-Escoí,et al.  SIPRe: a partial database replication protocol with SI replicas , 2008, SAC '08.

[12]  Fernando Pedone,et al.  Solving Atomic Multicast When Groups Crash , 2008, OPODIS.

[13]  Marc Shapiro,et al.  Fault-Tolerant Partial Replication in Large-Scale Database Systems , 2008, Euro-Par.

[14]  Ricardo Jiménez-Peris,et al.  An Autonomic Approach for Replication of Internet-based Services , 2008, 2008 Symposium on Reliable Distributed Systems.

[15]  Fernando Pedone,et al.  Partial replication in the Database State Machine , 2001, Proceedings IEEE International Symposium on Network Computing and Applications. NCA 2001.

[16]  Fernando Pedone,et al.  On the Inherent Cost of Atomic Broadcast and Multicast in Wide Area Networks , 2008, ICDCN.

[17]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[18]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[19]  Catriel Beeri,et al.  A model for concurrency in nested transactions systems , 1989, JACM.

[20]  U. Fritzke,et al.  Transactions on partially replicated data based on reliable and atomic multicasts , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[21]  Ricardo Jiménez-Peris,et al.  Consistent Data Replication: Is It Feasible in WANs? , 2005, Euro-Par.

[22]  Willy Zwaenepoel,et al.  C-JDBC: Flexible Database Clustering Middleware , 2004, USENIX Annual Technical Conference, FREENIX Track.