Geo-replicated storage with scalable deferred update replication

Many current online services are deployed over geographically distributed sites (i.e., datacenters). Such distributed services call for geo-replicated storage, that is, storage distributed and replicated among many sites. Geographical distribution and replication can improve locality and availability of a service. Locality is achieved by moving data closer to the users. High availability is attained by replicating data in multiple servers and sites. This paper considers a class of scalable replicated storage systems based on deferred update replication with transactional properties. The paper discusses different ways to deploy scalable deferred update replication in geographically distributed systems, considers the implications of these deployments on user-perceived latency, and proposes solutions. Our results are substantiated by a series of microbenchmarks and a social network application.

[1]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[2]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[3]  Francisco Moura,et al.  Optimistic total order in wide area networks , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[4]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[5]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[6]  Marc Shapiro,et al.  Non-monotonic Snapshot Isolation: Scalable and Strong Consistency for Geo-replicated Transactional Systems , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[7]  Luís E. T. Rodrigues,et al.  From spontaneous total order to uniform total order: different degrees of optimistic delivery , 2006, SAC '06.

[8]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[9]  Ricardo Jiménez-Peris,et al.  Boosting Database Replication Scalability through Partial Replication and 1-Copy-Snapshot-Isolation , 2007, 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007).

[10]  Jun Rao,et al.  Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore , 2011, Proc. VLDB Endow..

[11]  Luís E. T. Rodrigues,et al.  When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[12]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[13]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[14]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[15]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[16]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[17]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[18]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[19]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[20]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[21]  Marc Shapiro,et al.  G-DUR: a middleware for assembling, analyzing, and improving transactional protocols , 2014, Middleware.

[22]  Fernando Pedone,et al.  Optimistic Atomic Multicast , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[23]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[24]  Fernando Pedone,et al.  Geo-replicated storage with scalable deferred update replication , 2013, DSN.

[25]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.

[26]  Sameh Elnikety,et al.  Clock-SI: Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[27]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[28]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[29]  Fernando Pedone,et al.  Scalable deferred update replication , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[30]  Fernando Pedone,et al.  P-Store: Genuine Partial Replication in Wide Area Networks , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[31]  Rachid Guerraoui,et al.  Exploiting Atomic Broadcast in Replicated Databases , 1998, Euro-Par.

[32]  Rachid Guerraoui,et al.  The Database State Machine Approach , 2003, Distributed and Parallel Databases.

[33]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[34]  Marcos K. Aguilera,et al.  Surviving Congestion in Geo-Distributed Storage Systems , 2012, USENIX Annual Technical Conference.

[35]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .