A High-Performance Distributed Relational Database System for Scalable OLAP Processing

The scalability of systems such as Hive and Spark SQL that are built on top of big data platforms have enabled query processing over very large data sets. However, the per-node performance of these systems is typically low compared to traditional relational databases. Conversely, Massively Parallel Processing (MPP) databases do not scale as well as these systems. We present HRDBMS, a fully implemented distributed shared-nothing relational database developed with the goal of improving the scalability of OLAP queries. HRDBMS achieves high scalability through a principled combination of techniques from relational and big data systems with novel communication and work-distribution techniques. While we also support serializable transactions, the system has not been optimized for this use case. HRDBMS runs on a custom distributed and asynchronous execution engine that was built from the ground up to support highly parallelized operator implementations. Our experimental comparison with Hive, Spark SQL, and Greenplum confirms that HRDBMS's scalability is on par with Hive and Spark SQL (up to 96 nodes) while its per-node performance can compete with MPP databases like Greenplum.

[1]  Ke Wang,et al.  ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[2]  Guido Moerkotte,et al.  Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing , 1998, VLDB.

[3]  Michalis Petropoulos,et al.  Optimizing queries over partitioned tables in MPP systems , 2014, SIGMOD Conference.

[4]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[5]  M. Zaharia,et al.  Spark: The Definitive Guide: Big Data Processing Made Simple , 2018 .

[6]  Bruce G. Lindsay,et al.  Transaction management in the R* distributed database management system , 1986, TODS.

[7]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[8]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[9]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[10]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[11]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[12]  Hong Su,et al.  Cost-based query transformation in Oracle , 2006, VLDB.

[13]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[14]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[15]  Alon Y. Halevy,et al.  Query Optimization by Predicate Move-Around , 1994, VLDB.

[16]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[17]  Klaus Meyer-Wegener,et al.  Speaking in tongues: SQL access to NoSQL systems , 2014, SAC.

[18]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[19]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Milind Bhandarkar,et al.  HAWQ: a massively parallel processing SQL engine in hadoop , 2014, SIGMOD Conference.

[22]  Hamid Pirahesh,et al.  Complex query decorrelation , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[23]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[24]  David J. DeWitt,et al.  Data page layouts for relational databases on deep memory hierarchies , 2002, The VLDB Journal.

[25]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[26]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[27]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[28]  Won Kim,et al.  On optimizing an SQL-like nested query , 1982, TODS.

[29]  Grace Au,et al.  Hybrid Row-Column Partitioning in Teradata , 2016, Proc. VLDB Endow..

[30]  George Bosilca,et al.  Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology , 2007, ISPA.

[31]  Carlo Curino,et al.  Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , 2015, SIGMOD Conference.

[32]  Ioan Raicu,et al.  HRDBMS: Combining the Best of Modern and Traditional Relational Databases , 2019, ArXiv.

[33]  James K. Mullin,et al.  Optimal Semijoins for Distributed Database Systems , 1990, IEEE Trans. Software Eng..

[34]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[35]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[36]  Eugene Wong,et al.  Decomposition—a strategy for query processing , 1976, TODS.

[37]  Ke Wang,et al.  A convergence of key‐value storage systems from clouds to supercomputers , 2016, Concurr. Comput. Pract. Exp..

[38]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[39]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[40]  Michel Raynal,et al.  No Hot Spot Non-blocking Skip List , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[41]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[42]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[43]  Philip P. Macri,et al.  Deadlock detection and resolution in a CODASYL based data management system , 1976, SIGMOD '76.

[44]  Randy H. Katz,et al.  THE BERKELEY DATA ANALYSIS SYSTEM (BDAS): AN OPEN SOURCE PLATFORM FOR BIG DATA ANALYTICS , 2017 .

[45]  Ioan Raicu,et al.  HRDBMS: A NewSQL Database for Analytics , 2015, 2015 IEEE International Conference on Cluster Computing.

[46]  Martin Grund,et al.  Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[47]  Francisco Herrera,et al.  A comparison on scalability for batch big data processing on Apache Spark and Apache Flink , 2017 .