SAP HANA - From Relational OLAP Database to Big Data Infrastructure

SAP HANA started as one of the best-performing database engines for OLAP workloads strictly pursuing a main-memory centric ar- chitecture and exploiting hardware developments like large number of cores and main memories in the TByte range. Within this pa- per, we outline the steps from a traditional relational database en- gine to a Big Data infrastructure comprising different methods to handle data of different volume, coming in with different velocity, and showing a fairly large degree of variety. In order to make the presentation of this transformation process more tangible, we dis- cuss two major technical topics-HANA native integration points as well as extension points for collaboration with Hadoop-based data management infrastructures. The overall of goal of this paper is to (a) review current application patterns and resulting technical challenges as well as to (b) paint the big picture for upcoming ar- chitectural designs with SAP HANA database as the core of a SAP Big Data infrastructure.

[1]  Radu Stoica,et al.  Identifying hot and cold data in main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[2]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[3]  Yu Xu,et al.  Integrating hadoop and parallel DBMs , 2010, SIGMOD Conference.

[4]  Garret Swart,et al.  Oracle in-database hadoop: when mapreduce meets RDBMS , 2012, SIGMOD Conference.

[5]  Wolfgang Lehner,et al.  The Graph Story of the SAP HANA Database , 2013, BTW.

[6]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[7]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[8]  Goetz Graefe,et al.  The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules , 2008, ACM Queue.

[9]  Per-Åke Larson,et al.  Columnar Storage in SQL Server 2012 , 2012, IEEE Data Eng. Bull..

[10]  Norman May,et al.  Scaling Up Mixed Workloads: A Battle of Data Freshness, Flexibility, and Scheduling , 2014, TPCTC.

[11]  Roger MacNicol,et al.  Sybase IQ Multiplex - Designed For Analytics , 2004, VLDB.

[12]  Norman May,et al.  A study of partitioning and parallel UDF execution with the SAP HANA database , 2014, SSDBM '14.

[13]  Surajit Chaudhuri,et al.  An overview of business intelligence technology , 2011, Commun. ACM.

[14]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[15]  David J. DeWitt,et al.  Split query processing in polybase , 2013, SIGMOD '13.

[16]  Wolfgang Lehner,et al.  Bridging two worlds with RICE , 2011, Proc. VLDB Endow..

[17]  Norman May,et al.  Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA , 2014, SIGMOD Conference.

[18]  Yu Li,et al.  Emerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouse , 2011, SIGMOD '11.

[19]  Wolfgang Lehner,et al.  SAP HANA distributed in-memory database system: Transaction, session, and metadata management , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).