Realizing the Fault-Tolerance Promise of Cloud Storage Using Locks with Intent

Cloud computing promises easy development and deployment of large-scale, fault tolerant, and highly available applications. Cloud storage services are a key enabler of this, because they provide reliability, availability, and fault tolerance via internal mechanisms that developers need not reason about. Despite this, challenges remain for distributed cloud applications developers. They still need to make their code robust against failures of the machines running the code, and to reason about concurrent access to cloud storage by multiple machines. We address this problem with a new abstraction, called locks with intent, which we implement in a client library called Olive. Olive makes minimal assumptions about the underlying cloud storage, enabling it to operate on a variety of platforms including Amazon DynamoDB and Microsoft Azure Storage. Leveraging the underlying cloud storage, Olive's locks with intent offer strong exactly-once semantics for a snippet of code despite failures and concurrent duplicate executions. To ensure exactly-once semantics, Olive incurs the unavoidable overhead of additional logging writes. However, by decoupling isolation from atomicity, it supports consistency levels ranging from eventual to transactional. This flexibility allows applications to avoid costly transactional mechanisms when weaker semantics suffice. We apply Olive's locks with intent to build several advanced storage functionalities, including snapshots, transactions via optimistic concurrency control, secondary indices, and live table re-partitioning. Our experience demonstrates that Olive eases the burden of creating correct, fault-tolerant distributed cloud applications.

[1]  C. V. Ramamoorthy,et al.  Rollback and Recovery Strategies for Computer Programs , 1972, IEEE Transactions on Computers.

[2]  Philip A. Bernstein,et al.  Principles of Transaction Processing , 1996 .

[3]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[4]  Chao Xie,et al.  Salt: Combining ACID and BASE in a Distributed Database , 2014, OSDI.

[5]  L. Vivier,et al.  The new ext 4 filesystem : current status and future plans , 2007 .

[6]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[7]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[8]  Keir Fraser,et al.  Revocable locks for non-blocking programming , 2005, PPOPP.

[9]  Dahlia Malkhi,et al.  CORFU: A Shared Log Design for Flash Clusters , 2012, NSDI.

[10]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[11]  Hector Garcia-Molina,et al.  Exactly-once semantics in a replicated messaging system , 2001, Proceedings 17th International Conference on Data Engineering.

[12]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[13]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[14]  Rachid Guerraoui Transactional Exactly-once , 1999 .

[15]  George Candea,et al.  Improving availability with recursive microreboots: a soft-state system case study , 2004, Perform. Evaluation.

[16]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.

[17]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[18]  Barbara Liskov,et al.  Distributed programming in Argus , 1988, CACM.

[19]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[20]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[21]  Andrea C. Arpaci-Dusseau,et al.  Tombolo: Performance enhancements for cloud storage gateways , 2016, 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST).

[22]  Peter M. Chen,et al.  Exploring failure transparency and the limits of generic recovery , 2000, OSDI.

[23]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[24]  Rachid Guerraoui,et al.  X-ability: a theory of replication , 2000, PODC.

[25]  Chris Mason Journaling with ReisersFS , 2001 .

[26]  Barbara Liskov,et al.  Implementation of Argus , 1987, SOSP '87.

[27]  G. Ramalingam,et al.  Fault tolerance via idempotence , 2013, POPL.

[28]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[29]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[30]  Patrick E. O'Neil,et al.  Generalized isolation level definitions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[31]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[32]  Peter M. Chen,et al.  Discount Checking: Transparent, Low-Overhead Recovery for General Applications , 1998 .

[33]  Pat Helland,et al.  Idempotence is not a medical condition , 2012, Commun. ACM.

[34]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[35]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[36]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[37]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[38]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[39]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[40]  Zhe Wu,et al.  CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services , 2015, NSDI.

[41]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .