NoSQL Overview and Performance Testing of HBase Over Multiple Nodes with MySQL

The escalating amount of web-based applications in the fields of social networks, media, biology, physics, and the Internet of things are continuously generating large volume of data or Bigdata in terabytes, petabytes, and zetabytes over a short period of time. Consequently, an immense amount of read and write requests is generated without much latency. It is an immediate concern to store and analyze such huge amount of mixed ASCII and non-ASCII data efficiently, economically, and in no time. The conventional database systems like MySQL are incapable to handle such large volume of data in real time. At this point, there is a claim that column-based NoSQL databases like Accumulo, Cassandra, HBase, or document-based Apache CouchDB, Couchbase, MongoDB are capable of handling such huge data volume efficiently. In this work, we focussed on column-based Apache HBase, a NoSQL distributed database management system developed in the Bigdata domain on distributed file system architecture provided by Hadoop (HDFS). Let us begin the discussion on NoSQL HBase and the association between HBase and Hadoop. Then some of the important features of HBase are explained. After that, we discussed the advantages and limitations of HBase in distributed data processing over the other NoSQL database management systems. Finally, we performed some experiments to compare the time performance of HBase with traditional database MySQL as data size increases.

[1]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[2]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[3]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.

[4]  M. N. Vora,et al.  Hadoop-HBase for large-scale data , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[5]  Zhifeng Xiao,et al.  Remote sensing image database based on NOSQL database , 2011, 2011 19th International Conference on Geoinformatics.

[6]  Zhenyu Liu,et al.  Non-structure Data Storage Technology: A Discussion , 2012, 2012 IEEE/ACIS 11th International Conference on Computer and Information Science.

[7]  Wumuti Naheman,et al.  Review of NoSQL databases and performance testing on HBase , 2013, Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC).

[8]  Dongqi Wei,et al.  Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content , 2014, 2014 Fifth International Conference on Computing for Geospatial Research and Application.

[9]  Qi Zhang,et al.  HConfig: Resource adaptive fast bulk loading in HBase , 2014, 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[10]  Vijayalakshmi Bhupathiraju,et al.  The dawn of Big Data - Hbase , 2014, 2014 Conference on IT in Business, Industry and Government (CSIBIG).

[11]  Jian Cao,et al.  An Evolutionary Algorithm for Column Family Schema Optimization in HBase , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[12]  Kostas Magoutis,et al.  Rethinking HBase: Design and Implementation of an Elastic Key-Value Store over Log-Structured Local Volumes , 2015, 2015 14th International Symposium on Parallel and Distributed Computing.

[13]  Weidong Xiao,et al.  Spatio-temporal queries in HBase , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[14]  Archana Nair,et al.  Security maturity in NoSQL databases - are they secure enough to haul the modern IT applications? , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[15]  Eftim Zdravevski,et al.  Row Key Designs of NoSQL Database Tables and Their Impact on Write Performance , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).