Performance and Replica Consistency Simulation for Quorum-Based NoSQL System Cassandra

Distributed NoSQL systems such as Cassandra are popular nowadays. However, it is complicated and tedious to configure these systems to achieve their maximum performance for a given environment. This paper focuses on the application of a Coloured Petri Net-based simulation method on a quorum-based system, Cassandra. By analyzing the read and write process of Cassandra, we propose a CPN model, which can be used for performance analysis, optimization, and replica consistency detection. To help users understanding the NoSQL well, a CPN-based simulator called QuoVis is developed. Using QuoVis, users can visualize the read and write process of Cassandra, try different hardware parameters for performance simulation, optimizing system parameters such as timeout and data partitioning strategy, and detecting replica consistency. Experiments show our model fits the real Cassandra cluster well.

[1]  Ion Stoica,et al.  Probabilistically Bounded Staleness for Practical Partial Quorums , 2012, Proc. VLDB Endow..

[2]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[3]  Michael Lang,et al.  Using simulation to explore distributed key-value stores for extreme-scale system services , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Marta Z. Kwiatkowska,et al.  On Quantitative Modelling and Verification of DNA Walker Circuits Using Stochastic Petri Nets , 2015, Petri Nets.

[5]  David Bermbach,et al.  Eventual consistency: How soon is eventual? An evaluation of Amazon S3's consistency behavior , 2011, MW4SOC '11.

[6]  Qi Zhang,et al.  Policy-Driven Configuration Management for NoSQL , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[7]  Longendri Aguilera-Mendoza,et al.  Modeling and Simulation of Hadoop Distributed File System in a Cluster of Workstations , 2013, MEDI.

[8]  David Bermbach Benchmarking eventually consistent distributed storage systems , 2014 .

[9]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[10]  Ali Harounabadi,et al.  Presentation of an executable model for evaluation of software architecture using blackboard technique and formal models , 2015 .

[11]  Xiaozhou Li,et al.  Analyzing consistency properties for fun and profit , 2011, PODC '11.

[12]  Kai-hu Hou,et al.  Modeling and Simulation of Troubleshooting Process for Automobile Based on Petri Net and Flexsim , 2013 .

[13]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[14]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[15]  Chen Wang,et al.  MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs , 2014, Proc. VLDB Endow..

[16]  Ali Ghodsi,et al.  Eventual consistency today: limitations, extensions, and beyond , 2013, CACM.

[17]  Rodrigo Fonseca,et al.  Pivot tracing , 2018, USENIX ATC.

[18]  Pietro Piazzolla,et al.  Modelling Replication in NoSQL Datastores , 2014, QEST.

[19]  Franck Cappello,et al.  GloudSim: Google trace based cloud simulator with virtual machines , 2015, Softw. Pract. Exp..

[20]  Chen Wang,et al.  Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics , 2015, Proc. VLDB Endow..

[21]  Jianmin Wang,et al.  Inherent Replica Inconsistency in Cassandra , 2014, 2014 IEEE International Congress on Big Data.

[22]  Philip S. Yu,et al.  Optimizing data partition for scaling out NoSQL cluster , 2015, Concurr. Comput. Pract. Exp..

[23]  Matthew J. Daigle,et al.  Health Monitoring of a Planetary Rover Using Hybrid Particle Petri Nets , 2016, Petri Nets.

[24]  Michael Westergaard Access/CPN 2.0: A High-Level Interface to Coloured Petri Net Models , 2011, Petri Nets.

[25]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[26]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[27]  Márk Jelasity,et al.  PeerSim: A scalable P2P simulator , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[28]  Alexander H. Levis,et al.  Toward executable architectures to support evaluation , 2009, 2009 International Symposium on Collaborative Technologies and Systems.

[29]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.