PolyBench: The First Benchmark for Polystores

Modern business intelligence requires data processing not only across a huge variety of domains but also across different paradigms, such as relational, stream, and graph models. This variety is a challenge for existing systems that typically only support a single or few different data models. Polystores were proposed as a solution for this challenge and received wide attention both in academia and in industry. These are systems that integrate different specialized data processing engines to enable fast processing of a large variety of data models. Yet, there is no standard to assess the performance of polystores. The goal of this work is to develop the first benchmark for polystores. To capture the flexibility of polystores, we focus on high level features in order to enable an execution of our benchmark suite on a large set of polystore solutions.

[1]  Patrick Valduriez,et al.  Parallel database systems: Open problems and new issues , 1993, Distributed and Parallel Databases.

[2]  Patrick Valduriez,et al.  CloudMdsQL: querying heterogeneous cloud data stores with a common language , 2016, Distributed and Parallel Databases.

[3]  Hong Min,et al.  A Unified Computation Engine for Big Data Analytics , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[4]  Kevin Wilkinson,et al.  Optimizing analytic data flows for multiple execution engines , 2012, SIGMOD Conference.

[5]  Saman P. Amarasinghe,et al.  A Common Runtime for High Performance Data Analysis , 2017, CIDR.

[6]  Irena Holubová,et al.  Multi-model Data Management: What's New and What's Next? , 2017, EDBT.

[7]  Alvin Cheung,et al.  PipeGen: Data Pipe Generator for Hybrid Analytics , 2016, SoCC.

[8]  Jiaheng Lu Towards Benchmarking Multi-Model Databases , 2017, CIDR.

[9]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[10]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[11]  Jiaheng Lu,et al.  UDBMS: Road to Unification for Multi-model Data Management , 2016, ER Workshops.

[12]  Patrick Valduriez,et al.  Benchmarking polystores: The CloudMdsQL experience , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[13]  Michael Stonebraker,et al.  Data transformation and migration in polystores , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[14]  X. Zhu,et al.  iCARE: A framework for big data-based banking customer analytics , 2014, IBM J. Res. Dev..

[15]  Michael Stonebraker,et al.  The BigDAWG polystore system and architecture , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[16]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[17]  Kevin Wilkinson,et al.  Engine independence for logical analytic flows , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Hakan Hacigümüs,et al.  MISO: souping up big data query processing with a multistore system , 2014, SIGMOD Conference.

[19]  Patrick Valduriez,et al.  Integrating Big Data and Relational Data with a Functional SQL-like Query Language , 2015, DEXA.

[20]  Michael Stonebraker,et al.  "One size fits all": an idea whose time has come and gone , 2018, Making Databases Work.

[21]  Michael Stonebraker,et al.  The BigDAWG Polystore System , 2015, SGMD.

[22]  Matei Zaharia,et al.  A Common Runtime for High Performance Data Analysis , 2017, CIDR.

[23]  Hong Min,et al.  Octopus: Hybrid Big Data Integration Engine , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[24]  Michael Stonebraker,et al.  Database engine integration and performance analysis of the BigDAWG polystore system , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).