Design Considerations Towards AI-Driven Co-Processor Accelerated Database Management

Adopting AI techniques for query optimization is an ongoing research interest in the database community. Currently, the search space for the best plan increases drastically, with the growing heterogeneity of the target hardware, the novel tuning choices offered, and co-processing. Hence, the need for AI techniques to identify such a best plan in a reasonable time-frame is imminent. Though AI-based solutions for improving query processing exist, there is still a need for principled system designs able to incorporate the different innovations, leverage synergy effects, and keep with production-readiness expectations when using AI. In this paper, we propose a series of seven ideal design characteristics we envision for such systems. We then make the case for revisiting the traditional Mariposa system, to consider its market concepts as a useful starting point for new system designs to support the identified characteristics. Altogether, we expect that this short paper could be a modest contribution towards AI-driven heterogeneous processing, emphasizing the practical aspects of a supportive and principled overall design.

[1]  Gunter Saake,et al.  Toward Hardware-Sensitive Database Operations , 2014, EDBT.

[2]  Patrick Valduriez,et al.  SQLB: A Query Allocation Framework for Autonomous Consumers and Providers , 2007, VLDB.

[3]  Gunter Saake,et al.  Automated Vertical Partitioning with Deep Reinforcement Learning , 2019, ADBIS.

[4]  Olga Papaemmanouil,et al.  NashDB: An End-to-End Economic Method for Elastic Database Fragmentation, Replication, and Provisioning , 2018, SIGMOD Conference.

[5]  Guoliang Li,et al.  XuanYuan: An AI-Native Database , 2019, IEEE Data Eng. Bull..

[6]  Gunter Saake,et al.  Memory Management Strategies in CPU/GPU Database Systems: A Survey , 2018, BDAS.

[7]  Gunter Saake,et al.  Adaptive Data Processing in Heterogeneous Hardware Systems , 2018, Grundlagen von Datenbanken.

[8]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[9]  Lin Ma,et al.  External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems , 2019, IEEE Data Eng. Bull..

[10]  Mohamed Zahran Heterogeneous Computing: Hardware and Software Perspectives , 2016, Applicative 2016.

[11]  Andreas Kipf,et al.  Estimating Cardinalities with Deep Sketches , 2019, SIGMOD Conference.

[12]  Rainer Schlosser,et al.  Self-driving database systems: a conceptual approach , 2020, Distributed and Parallel Databases.

[13]  Ke Zhou,et al.  An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning , 2019, SIGMOD Conference.

[14]  Tim Kraska,et al.  SageDB: A Learned Database System , 2019, CIDR.

[15]  Herodotos Herodotou,et al.  Automated Experiment-Driven Management of (Database) Systems , 2009, HotOS.

[16]  Gunter Saake,et al.  Toward GPU-accelerated Database Optimization , 2015, Datenbank-Spektrum.

[17]  Ion Stoica,et al.  A View on Deep Reinforcement Learning in System Optimization , 2019 .

[18]  Carsten Binnig,et al.  Learning a Partitioning Advisor for Cloud Databases , 2020, SIGMOD Conference.

[19]  Neil D. Lawrence,et al.  Challenges in Deploying Machine Learning: A Survey of Case Studies , 2020, ACM Comput. Surv..

[20]  Olga Papaemmanouil,et al.  Deep Reinforcement Learning for Join Order Enumeration , 2018, aiDM@SIGMOD.

[21]  Gunter Saake,et al.  Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine's Perspective , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[22]  Anastasia Ailamaki,et al.  GPU-accelerated data management under the test of time , 2020, CIDR.

[23]  Gunter Saake,et al.  SIMD Vectorized Hashing for Grouped Aggregation , 2018, ADBIS.

[24]  Shan Wang,et al.  One size does not fit all: accelerating OLAP workloads with GPUs , 2020, Distributed and Parallel Databases.

[25]  Mao Yang,et al.  The Case for Learning-and-System Co-design , 2019, ACM SIGOPS Oper. Syst. Rev..

[26]  Gunter Saake,et al.  GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning , 2018, aiDM@SIGMOD.

[27]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[28]  Geoff Hulten,et al.  Building Intelligent Systems , 2018, Apress.

[29]  Randy H. Katz,et al.  A Berkeley View of Systems Challenges for AI , 2017, ArXiv.

[30]  Chengliang Chai,et al.  Database Meets Artificial Intelligence: A Survey , 2020, IEEE Transactions on Knowledge and Data Engineering.

[31]  E. Xing,et al.  Technology readiness levels for machine learning systems , 2020, Nature Communications.