Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed Storage

By replicating data across sites in multiple geographic regions, web services can maximize availability and minimize latency for their users. However, when sacrificing data consistency is not an option, we show that service providers have to today incur significantly higher cost to meet desired latency goals than the lowest cost theoretically feasible. We show that the key to addressing this sub-optimality is to 1) allow for erasure coding, not just replication, of data across data centers, and 2) mitigate the resultant increase in read and write latencies by rethinking how to enable consensus across the widearea network. Our extensive evaluation mimicking web service deployments on the Azure cloud service shows that we enable near-optimal latency versus cost tradeoffs.

[1]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[2]  Syed Hussain,et al.  Clay Codes: Moulding MDS Codes to Yield an MSR Code , 2018, FAST.

[3]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[4]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[5]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[6]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[7]  Roy T. Fielding,et al.  Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing , 2014, RFC.

[8]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[9]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[10]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[11]  Marcos K. Aguilera,et al.  Stable Leader Election , 2001, DISC.

[12]  Heng Zhang,et al.  Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.

[13]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[14]  Butler W. Lampson,et al.  The ABCD's of Paxos , 2001, PODC '01.

[15]  Leslie Lamport,et al.  The temporal logic of actions , 1994, TOPL.

[16]  Cheng Li,et al.  Making geo-replicated systems fast as possible, consistent when necessary , 2012, OSDI 2012.

[17]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[18]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[19]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[20]  Keith Winstein,et al.  The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files for a File-Storage Service , 2017, NSDI.

[21]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[22]  Marcos K. Aguilera,et al.  RPC Chains: Efficient Client-Server Communication in Geodistributed Systems , 2009, NSDI.

[23]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[24]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[25]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[26]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[27]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[28]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[29]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[30]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[31]  Divyakant Agrawal,et al.  DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications , 2018, SIGMOD Conference.

[32]  Fahad R. Dogar,et al.  Measuring and Improving the Reliability of Wide-Area Cloud Paths , 2017, WWW.

[33]  Philip J. Fleming,et al.  How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.

[34]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[35]  David G. Andersen,et al.  Paxos Quorum Leases: Fast Reads Without Sacrificing Writes , 2014, SoCC.

[36]  Nancy A. Lynch,et al.  A coded shared atomic memory algorithm for message passing architectures , 2014, 2014 IEEE 13th International Symposium on Network Computing and Applications.

[37]  Marcos K. Aguilera,et al.  Using erasure codes efficiently for storage in a distributed system , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[38]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[39]  Murat Demirbas,et al.  Multileader WAN Paxos: Ruling the Archipelago with Fast Consensus , 2017, ArXiv.

[40]  Kannan Ramchandran,et al.  EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[41]  Hagit Attiya,et al.  Sharing memory robustly in message-passing systems , 1990, PODC '90.

[42]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[43]  Sanjeev Kumar,et al.  Existential consistency: measuring and understanding consistency at Facebook , 2015, SOSP.

[44]  Weimin Zheng,et al.  When paxos meets erasure code: reduce network and storage cost in state machine replication , 2014, HPDC '14.

[45]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[46]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[47]  Cheng Huang,et al.  Giza: Erasure Coding Objects across Global Data Centers , 2017, USENIX Annual Technical Conference.

[48]  Dahlia Malkhi,et al.  Flexible Paxos: Quorum Intersection Revisited , 2016, OPODIS.

[49]  Douglas B. Terry,et al.  A Self-Configurable Geo-Replicated Cloud Storage System , 2014, OSDI.

[50]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.