Challenges in NoSQL-Based Distributed Data Storage: A Systematic Literature Review

Key-Value stores (KVSs) are the most flexible and simplest model of NoSQL databases, which have become highly popular over the last few years due to their salient features such as availability, portability, reliability, and low operational cost. From the perspective of software engineering, the chief obstacle for KVSs is to achieve software quality attributes (consistency, throughput, latency, security, performance, load balancing, and query processing) to ensure quality. The presented research is a Systematic Literature Review (SLR) to find the state-of-the-art research in the KVS domain, and through doing so determine the major challenges and solutions. This work reviews the 45 papers between 2010–2018 that were found to be closely relevant to our study area. The results show that performance is addressed in 31% of the studies, consistency is addressed in 20% of the studies, latency and throughput are addressed in 16% of the studies, query processing is addressed in 13% of studies, security is addressed in 11% of the studies, and load balancing is addressed in 9% of the studies. Different models are used for execution. The indexing technique was used in 20% of the studies, the hashing technique was used in 13% of the studies, the caching and security techniques were used together in 9% of the studies, the batching technique was used in 5% of the studies, the encoding techniques and Paxos technique were used together in 4% of the studies, and 36% of the studies used other techniques. This systematic review will enable researchers to design key-value stores as efficient storage. Regarding future collaborations, trust and privacy are the quality attributes that can be addressed; KVS is an emerging facet due to its widespread popularity, opening the way to deploy it with proper protection.

[1]  Gustavo Alonso,et al.  Fast and strongly-consistent per-item resilience in key-value stores , 2018, EuroSys.

[2]  Michiaki Iwazume,et al.  Big data in memory: Benchimarking in memory database using the distributed key-value store for machine to machine communication , 2014, 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[3]  Adam Lith,et al.  Investigating storage solutions for large data - A comparison of well performing and scalable data storage solutions for real time extraction and batch insertion of data , 2010 .

[4]  Wei Ge,et al.  CinHBa: A Secondary Index with Hotscore Caching Policy on Key-Value Data Store , 2014, ADMA.

[5]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[6]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[7]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[8]  Norman C. Hutchinson,et al.  Innesto: A Searchable Key/Value Store for Highly Dimensional Data , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[9]  Norbert Ritter,et al.  NoSQL database systems: a survey and decision guidance , 2017, Computer Science - Research and Development.

[10]  Jin Li,et al.  FlashStore: High Throughput Persistent Key-Value Store , 2010, Proc. VLDB Endow..

[11]  Heon Young Yeom,et al.  Improving Performance of Cloud Key-Value Storage Using Flushing Optimization , 2016, 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W).

[12]  Indranil Gupta,et al.  Quantitative Analysis of Consistency in NoSQL Key-Value Stores , 2015, QEST.

[13]  Pearl Brereton,et al.  Lessons from applying the systematic literature review process within the software engineering domain , 2007, J. Syst. Softw..

[14]  Alejandro Russo,et al.  Cryptographically Secure Information Flow Control on Key-Value Stores , 2017, CCS.

[15]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[16]  S. Gajendran A Survey on NoSQL Databases , 2012 .

[17]  Andrea C. Arpaci-Dusseau,et al.  WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[18]  Xin Zhao,et al.  Elasticat: A load rebalancing framework for cloud-based key-value stores , 2012, 2012 19th International Conference on High Performance Computing.

[19]  M. Indiramma,et al.  A novel redis security extension for NoSQL database using authentication and encryption , 2015, 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[20]  Yuan Yuan,et al.  Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores , 2015, Proc. VLDB Endow..

[21]  Ivan Beschastnikh,et al.  Scalable consistency in Scatter , 2011, SOSP.

[22]  Salahadin Adam,et al.  Top NewSQL Databases and Features Classification , 2018 .

[23]  Dan Feng,et al.  PaxStore : A Distributed Key Value Storage System , 2014, NPC.

[24]  Yongkun Li,et al.  Improving Write Performance of LSMT-Based Key-Value Store , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[25]  Natalia Juristo Juzgado,et al.  Developing search strategies for detecting relevant experiments , 2009, Empirical Software Engineering.

[26]  Aoying Zhou,et al.  Quality-aware schedulers for weak consistency key-value data stores , 2013, Distributed and Parallel Databases.

[27]  Alain Abran,et al.  Building an Experiment Baseline in Migration Process from SQL Databases to Column Oriented No-SQL Databases , 2014 .

[28]  Gwangil Jeon,et al.  Enhancing lookup performance of key-value stores using cuckoo hashing , 2013, RACS.

[29]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[30]  Panagiotis Papadopoulos,et al.  Strengthening Consistency in the Cassandra Distributed Key-Value Store , 2013, DAIS.

[31]  Hyotaek Shim PHash: A memory-efficient, high-performance key-value store for large-scale data-intensive applications , 2017, J. Syst. Softw..

[32]  Sergei D. Kuznetsov,et al.  NoSQL data management systems , 2014, Programming and Computer Software.

[33]  Lei Wang,et al.  Research of Massive Data Caching Strategy Based on Key-Value Storage Model , 2015, IScIDE.

[34]  Daniel Lemire,et al.  Upscaledb: Efficient integer-key compression in a key-value store using SIMD instructions , 2017, Inf. Syst..

[35]  Jin Li,et al.  SkimpyStash: RAM space skimpy key-value store on flash-based storage , 2011, SIGMOD '11.

[36]  Miriam A. M. Capretz,et al.  Data management in cloud environments: NoSQL and NewSQL data stores , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[37]  Jiyong Jang,et al.  Lightweight authentication of freshness in outsourced key-value stores , 2014, ACSAC '14.

[38]  Mohamed A. Mohamed,et al.  Relational vs. NoSQL Databases: A Survey , 2014 .

[39]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[40]  Subhajyoti Bandyopadhyay,et al.  Cloud Computing - The Business Perspective , 2011, 2011 44th Hawaii International Conference on System Sciences.

[41]  Daniel M. Dunlavy,et al.  Using NoSQL databases for streaming network analysis , 2012, IEEE Symposium on Large Data Analysis and Visualization (LDAV).

[42]  Liu Chen,et al.  A Survey on NoSQL Stores , 2018, ACM Comput. Surv..

[43]  Jorge Bernardino,et al.  Choosing the right NoSQL database for the job: a quality attribute evaluation , 2015, Journal of Big Data.

[44]  Qi Wang,et al.  Handling multi-dimensional complex queries in key-value data stores , 2017, Inf. Syst..

[45]  Ke Wang,et al.  A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud , 2015 .

[46]  Kenneth Mark Anderson,et al.  MySQL to NoSQL: data modeling challenges in supporting scalability , 2012, SPLASH '12.

[47]  Alex F. R. Trajano,et al.  Two-phase load balancing of In-Memory Key-Value Storages using Network Functions Virtualization (NFV) , 2016, J. Netw. Comput. Appl..

[48]  Syed Akhter Hossain,et al.  NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison , 2013, ArXiv.

[49]  Neal Leavitt,et al.  Will NoSQL Databases Live Up to Their Promise? , 2010, Computer.

[50]  E. F. Codd,et al.  Relational database: a practical foundation for productivity , 1982, CACM.

[51]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[52]  Boris Grot,et al.  Scale-out ccNUMA: exploiting skew with strongly consistent caching , 2018, EuroSys.

[53]  Anja Feldmann,et al.  A One-Year Perspective on Exposed In-memory Key-Value Stores , 2016, SafeConfig@CCS.

[54]  Jin-Soo Kim,et al.  ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys , 2016, IEEE Transactions on Computers.

[55]  Kai Lu,et al.  Design and Implementation of Distributed Stage DB: A High Performance Distributed Key-Value Database , 2016 .

[56]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[57]  Masato Asahara,et al.  Pangaea: A Single Key Space, Inter-datacenter Key-Value Store , 2013, 2013 International Conference on Parallel and Distributed Systems.

[58]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..