Consistency Management in Cloud Storage Systems

With the emergence of cloud computing, many organizations have moved their data to the cloud in order to provide scalable, reliable and high available services. As these services mainly rely on geographically-distributed data replication to guarantee good performance and high availability, consistency comes into question. The CAP theorem discusses tradeoffs between consistency, availability, and partition tolerance, and concludes that only two of these three properties can be guaranteed simultaneously in replicated storage systems. With data growing in size and systems growing in scale, new tradeoffs have been introduced and new models are emerging for maintaining data consistency. In this chapter, we discuss the consistency issue and describe the CAP theorem as well as its limitations and impacts on big data management in large scale systems. We then briefly introduce several models of consistency in cloud storage systems. Then, we study some state-of-the-art cloud storage systems from both enterprise and academia, and discuss their contribution to maintaining data consistency. To complete our chapter, we introduce the current trend toward adaptive consistency in big data systems and introduce our dynamic adaptive consistency solution (Harmony). We conclude by discussing the open issues and challenges raised regarding consistency in the cloud.

[1]  Marc Shapiro,et al.  Eventual Consistency , 2009, Encyclopedia of Database Systems.

[2]  Cheng Li,et al.  Making geo-replicated systems fast as possible, consistent when necessary , 2012, OSDI 2012.

[3]  Michel Dubois,et al.  Concurrent Miss Resolution in Multiprocessor Caches , 1988, ICPP.

[4]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[5]  Thomas C. Bressoud,et al.  Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles , 2007, SOSP 2007.

[6]  Hai Jin,et al.  Maestro: Replica-Aware Map Scheduling for MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[7]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[8]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[9]  Rui Liu,et al.  DAX: A Widely Distributed Multi-tenant Storage Service for DBMS Hosting , 2013, Proc. VLDB Endow..

[10]  John F. Meyer,et al.  Performability management in distributed database systems: an adaptive concurrency control protocol , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[11]  Andreas Nowatzyk,et al.  Coherent Shared Memory on a Distributed Memory Machine , 1989, International Conference on Parallel Processing.

[12]  Hai Jin,et al.  Tools and Technologies for Building Clouds , 2010, Cloud Computing.

[13]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[14]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[15]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[16]  Andrew S. Tanenbaum,et al.  Distributed systems - principles and paradigms, 2nd Edition , 2007 .

[17]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[18]  Michel Dubois,et al.  Memory access buffering in multiprocessors , 1998, ISCA '98.

[19]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[20]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[21]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[22]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[23]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[24]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[25]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[26]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[27]  Gustavo Alonso,et al.  Consistency Rationing in the Cloud: Pay only when it matters , 2009, Proc. VLDB Endow..

[28]  María S. Pérez-Hernández,et al.  Harmony: Towards Automated Self-Adaptive Consistency in Cloud Storage , 2012, 2012 IEEE International Conference on Cluster Computing.

[29]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[30]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[31]  María S. Pérez-Hernández,et al.  Consistency in the Cloud: When Money Does Matter! , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[32]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[33]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[34]  E. Brewer,et al.  CAP twelve years later: How the "rules" have changed , 2012, Computer.

[35]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[36]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[37]  Nancy A. Lynch,et al.  Perspectives on the CAP Theorem , 2012, Computer.

[38]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[39]  Kevin Lee,et al.  Data Consistency Properties and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective , 2011, CIDR.

[40]  GhemawatSanjay,et al.  The Google file system , 2003 .