Towards Scalable Real-time Analytics: An Architecture for Scale-out of OLxP Workloads

We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers.

[1]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[2]  Alfons Kemper,et al.  ScyPer: elastic OLAP throughput on transactional data , 2013, DanaC '13.

[3]  Alfons Kemper,et al.  Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems , 2015, SIGMOD Conference.

[4]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.

[5]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[6]  Peter Sanders,et al.  Fast OLAP query execution in main memory on large data in a cluster , 2013, 2013 IEEE International Conference on Big Data.

[7]  Ian Rae,et al.  F1: A Distributed SQL Database That Scales , 2013, Proc. VLDB Endow..

[8]  Dahlia Malkhi,et al.  CORFU: A distributed shared log , 2013, TOCS.

[9]  Wolfgang Lehner,et al.  Towards a web-scale data management ecosystem demonstrated by SAP HANA , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  Ali Ghodsi,et al.  Coordination Avoidance in Database Systems , 2014, Proc. VLDB Endow..

[11]  Gerhard Weikum,et al.  Unbundling Transaction Services in the Cloud , 2009, CIDR.

[12]  Benjamin Reed,et al.  Durability with BookKeeper , 2013, OPSR.

[13]  Michael J. Freedman,et al.  Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads , 2009, USENIX Annual Technical Conference.

[14]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[15]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[16]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[17]  Peter Sanders,et al.  Efficient many-core query execution in main memory column-stores , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[18]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[19]  Martin Grund,et al.  Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[20]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[21]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[22]  Mohamed F. Mokbel,et al.  Deuteronomy: Transaction Support for Cloud Data , 2011, CIDR.

[23]  Kenneth Salem,et al.  Lazy database replication with snapshot isolation , 2006, VLDB.

[24]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[25]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[26]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[27]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[28]  M. Tamer Özsu,et al.  ConfluxDB: Multi-Master Replication for Partitioned Snapshot Isolation Databases , 2014, Proc. VLDB Endow..