Effectiveness of NoSQL and NewSQL Databases in Mobile Network Event Data: Cassandra and ParStream/Kinetic

Continuously growing amount of data has inspired seeking more and more efficient database solutions for storing and manipulating data. In big data sets, NoSQL databases have been established as alternatives for traditional SQL databases. The effectiveness of these databases has been widely tested, but the tests focused only on key-value data that is structurally very simple. Many application domains, such as telecommunication, involve more complex data structures. Huge amount of Mobile Network Event (MNE) data is produced by an increasing number of mobile and ubiquitous applications. MNE data is structurally predetermined and typically contains a large number of columns. Applications that handle MNE data are usually insert intensive, as a huge amount of data are generated during rush hours. NoSQL provides high scalability and its column family stores suits MNE data well, but NoSQL does not support ACID features of the traditional relational databases. NewSQL is a new kind of databases, which provide the high scalability of NoSQL while still maintaining ACID guarantees of the traditional DBMS. In the paper, we evaluation NEM data storing and aggregating efficiency of Cassandra and ParStream/Kinetic databases and aim to find out whether the new kind of database technology can clearly bring performance advantages over legacy database technology and offers an alternative to existing solutions. Among the column family stores of NoSQL, Cassandra is especially a good choice for insert intensive applications due to its way to handle data insertions. ParStream is a novel and advanced NewSQL like database and is recently integrated into Cisco Kinetic. The results of the evaluation show that ParStream is much faster than Cassandra when storing and aggregating MNE data and the NewSQL is a very strong alternative to existing database solutions for insert intensive applications.

[1]  Zachary Parker,et al.  Comparing NoSQL MongoDB to an SQL DB , 2013, ACMSE '13.

[2]  Yannis Papakonstantinou,et al.  The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases , 2014 .

[3]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[4]  Jorge Bernardino,et al.  Which NoSQL Database? A Performance Overview , 2014, Open J. Databases.

[5]  Cristian Bucur,et al.  A comparison between several NoSQL databases with comments and notes , 2011, 2011 RoEduNet International Conference 10th Edition: Networking in Education and Research.

[6]  ThankGod Sani Adeyi,et al.  Performance Evaluation of NoSQL Systems using YCSB in a Resource Austere Environment , 2014 .

[7]  Miriam A. M. Capretz,et al.  Data management in cloud environments: NoSQL and NewSQL data stores , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[8]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[9]  Neil A. Ernst,et al.  Performance Evaluation of NoSQL Databases: A Case Study , 2015, PABS@ICPE.

[10]  Jorge Bernardino,et al.  NewSQL Databases - MemSQL and VoltDB Experimental Evaluation , 2017, KEOD.

[11]  Sathiamoorthy Manoharan,et al.  A performance comparison of SQL and NoSQL databases , 2013, 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[12]  Volker Markl,et al.  Issues in big data testing and benchmarking , 2013, DBTest '13.

[13]  Jörn Kuhlenkamp,et al.  Benchmarking Scalability and Elasticity of Distributed Database Systems , 2014, Proc. VLDB Endow..

[14]  Tilmann Rabl,et al.  From BigBench to TPCx-BB: Standardization of a Big Data Benchmark , 2016, TPCTC.

[15]  Israel Spiegler,et al.  Storage and retrieval considerations of binary data bases , 1985, Inf. Process. Manag..

[16]  Massimo Carro,et al.  NoSQL Databases , 2014, ArXiv.