Graph DBs vs. Column-Oriented Stores: A Pure Performance Comparison

Cloud Computing has brought a great change in the way information is stored and applications run. In order for one or more clusters to work as a cloud we need a middleware framework, such as Apache Hadoop [17], that provides reliability, scalability and distributed computing. Once the infrastructure has been established, a software framework can be installed, which runs on top of it and will be the connection to communicate with the applications developed by the users. The software, in this regard, is a NoSQL database. This paper deals with the problem of searching data in some widespread NoSQL databases used in cloud computing. Two categories of NoSQL databases are compared; one based on columns using a column-oriented key-value store, HBase [6], and a high-available graph database, Neo4j [11]. HBase is a distributed, scalable storage system that runs on top of HDFS, and has being designed based on Google's BigTable [4]. Neo4j has being designed and developed to be a reliable database, optimized for graph structures, instead of tables, and is a robust, scalable, high performance and high available database that supports ACID transactions and queries written in Cypher language. The aim of this paper is to create a novel system that will decide when a query must be send to be executed in a key-value store or a graph database. Thus, an experimental pure performance comparison has been made between Apache HBase and Neo4j for a variety of queries, that were programmed using systems API's and Java language.

[1]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[2]  Egor V. Kostylev,et al.  Containment of Data Graph Queries , 2014, ICDT.

[3]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[4]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[5]  Yang Zheng,et al.  Performance analysis and testing of HBase based on its architecture , 2013, 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).

[6]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2013 .

[7]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[8]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[9]  Yixin Chen,et al.  A comparison of a graph database and a relational database: a data provenance perspective , 2010, ACM SE '10.

[10]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[11]  René Peinl,et al.  Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j , 2013, EDBT '13.

[12]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[13]  Peter T. Wood,et al.  Query languages for graph databases , 2012, SGMD.

[14]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  E. Brewer,et al.  CAP twelve years later: How the "rules" have changed , 2012, Computer.

[16]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.