Quantitative Analysis of Consistency in NoSQL Key-Value Stores

The promise of high scalability and availability has prompted many companies to replace traditional relational database management systems (RDBMS) with NoSQL key-value stores. This comes at the cost of relaxed consistency guarantees: key-value stores only guarantee eventual consistency in principle. In practice, however, many key-value stores seem to offer stronger consistency. Quantifying how well consistency properties are met is a non-trivial problem.  We address this problem by formally modeling key-value stores as probabilistic systems and quantitatively analyzing their consistency properties by both statistical model checking and implementation evaluation. We present for the first time a formal probabilistic model of Apache Cassandra, a popular NoSQL key-value store, and quantify how much Cassandra achieves various consistency guarantees under various conditions. To validate our model, we evaluate multiple consistency properties using two methods and compare them against each other. The two methods are: (1) an implementation-based evaluation of the source code; and (2) a statistical model checking analysis of our probabilistic model.

[1]  Ion Stoica,et al.  Probabilistically Bounded Staleness for Practical Partial Quorums , 2012, Proc. VLDB Endow..

[2]  Pietro Piazzolla,et al.  Performance Evaluation of NoSQL Databases , 2014, EPEW.

[3]  Mahesh Viswanathan,et al.  On Statistical Model Checking of Stochastic Systems , 2005, CAV.

[4]  José Meseguer,et al.  PMaude: Rewrite-based Specification Language for Probabilistic Object Systems , 2006, QAPL.

[5]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[6]  Indranil Gupta,et al.  Formal Modeling and Analysis of Cassandra in Maude , 2014, ICFEM.

[7]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OSDI '02.

[8]  Mahesh Viswanathan,et al.  VESTA: A statistical model-checker and analyzer for probabilistic systems , 2005, Second International Conference on the Quantitative Evaluation of Systems (QEST'05).

[9]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[10]  José Meseguer,et al.  Statistical Model Checking for Composite Actor Systems , 2012, WADT.

[11]  Werner Vogels,et al.  Eventually consistent , 2008, CACM.

[12]  Håkan L. S. Younes,et al.  Statistical probabilistic model checking with a focus on time-bounded properties , 2006, Inf. Comput..

[13]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[14]  Pietro Piazzolla,et al.  Modelling Replication in NoSQL Datastores , 2014, QEST.

[15]  Kevin Lee,et al.  Data Consistency Properties and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective , 2011, CIDR.

[16]  Doug Terry,et al.  Replicated data consistency explained through baseball , 2013, CACM.

[17]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[18]  Wojciech M. Golab,et al.  Toward a Principled Framework for Benchmarking Consistency , 2012, HotDep.

[19]  Mauro Iacono,et al.  Performance evaluation of NoSQL big-data applications using multi-formalism models , 2014, Future Gener. Comput. Syst..

[20]  David Bermbach,et al.  Eventual consistency: How soon is eventual? An evaluation of Amazon S3's consistency behavior , 2011, MW4SOC '11.

[21]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[22]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[23]  Narciso Martí-Oliet,et al.  All About Maude - A High-Performance Logical Framework, How to Specify, Program and Verify Systems in Rewriting Logic , 2007, All About Maude.

[24]  Peter Csaba Ölveczky,et al.  Formal modeling and analysis of RAMP transaction systems , 2016, SAC.

[25]  Alan Fekete,et al.  YCSB+T: Benchmarking web-scale transactional databases , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[26]  José Meseguer,et al.  Conditioned Rewriting Logic as a United Model of Concurrency , 1992, Theor. Comput. Sci..

[27]  Shahram Ghandeharizadeh,et al.  BG: A Benchmark to Evaluate Interactive Social Networking Actions , 2013, CIDR.

[28]  José Meseguer,et al.  PVeStA: A Parallel Statistical Model Checking and Quantitative Analysis Tool , 2011, CALCO.