Global-Scale Placement of Transactional Data Stores

Global-Scale Data Management (GSDM) empowers systems by providing higher levels of fault-tolerance, read availability, and efficiency in utilizing cloud resources. But, at which datacenters should data be placed? Current cloud providers offer tens of datacenters and hundreds of edge datacenters that are globally distributed all over the world. Unlike networks within a datacenter, the topology of theWide-Area Network (WAN) is asymmetric and diverse—the latency connecting a pair of datacenters can be an order of magnitude larger than the latency connecting another pair. This makes placement a significant factor in performance. However, it is not only placement. The specifics of the transaction management protocol play a crucial role in deciding which placement is ideal. In this paper, we develop GPlacer, a placement optimization framework that embeds the transaction protocol constraints into an optimization to derive both the data placement and the transaction protocol configuration that minimize the overall transaction latency. In developing GPlacer, we discover counter-intuitive lessons about data placement and transaction execution practices. Our evaluation shows that applying these lessons in addition to known best practices generate deployments that reduce the average transaction latency by up to 68%.

[1]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[2]  Arif Merchant,et al.  Take me to your leader! Online Optimization of Distributed Storage Configurations , 2015, Proc. VLDB Endow..

[3]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[4]  Shuai Mu,et al.  The SNOW Theorem and Latency-Optimal Read-Only Transactions , 2016, OSDI.

[5]  Gang Chen,et al.  Towards a Non-2PC Transaction Management in Distributed Database Systems , 2016, SIGMOD Conference.

[6]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[7]  Jeong-Hyon Hwang,et al.  Wide area placement of data replicas for fast and highly available data access , 2011, DIDC '11.

[8]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[9]  Divyakant Agrawal,et al.  Low-Latency Multi-Datacenter Databases using Replicated Commit , 2013, Proc. VLDB Endow..

[10]  Divyakant Agrawal,et al.  Message Futures: Fast Commitment of Transactions in Multi-datacenter Environments , 2013, CIDR.

[11]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[12]  Leslie Lamport,et al.  Consensus on transaction commit , 2004, TODS.

[13]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[14]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[15]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[16]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[17]  Nicolas Bruno,et al.  Spanner: Becoming a SQL System , 2017, SIGMOD Conference.

[18]  Haiying Shen,et al.  Minimum-Cost Cloud Storage Service Across Multiple Cloud Providers , 2017, IEEE/ACM Transactions on Networking.

[19]  Xin Chen,et al.  F1: the fault-tolerant distributed RDBMS supporting google's ad business , 2012, SIGMOD Conference.

[20]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[21]  Sudhanva Gurumurthi,et al.  Phase Change Memory: From Devices to Systems , 2011, Phase Change Memory.

[22]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[23]  Divyakant Agrawal,et al.  Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores , 2015, SIGMOD Conference.

[24]  Divyakant Agrawal,et al.  DB-Risk: The Game of Global Database Placement , 2016, SIGMOD Conference.

[25]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.