GridFormation: Towards Self-Driven Online Data Partitioning using Reinforcement Learning

In this paper we define a research agenda to develop a general framework supporting online autonomous tuning of data partitioning and layouts with a reinforcement learning formulation. We establish the core elements of our approach: agent, environment, action space and supporting components. Externally predicted workloads and the current physical design serve as input to our agent. The environment guides the search process by generating immediate rewards based on fresh cost estimates, for either the entirety or a sample of queries from the workload, and by deciding the possible actions given a state. This set of actions is configurable, enabling the representation of different partitioning problems. For use in an online setting the agent learns a fixed-length sequence of n actions that maximize the temporal reward for the predicted workload. Through an initial implementation we assert the feasibility of our approach. To conclude, we list open challenges for this work.

[1]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[2]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[3]  Anastasia Ailamaki,et al.  ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data , 2017, Proc. VLDB Endow..

[4]  Alekh Jindal,et al.  Relax and Let the Database Do the Partitioning Online , 2011, BIRTE.

[5]  Nicolas Bruno,et al.  Automated partitioning design in parallel database systems , 2011, SIGMOD '11.

[6]  Alekh Jindal,et al.  Towards a One Size Fits All Database Architecture , 2011, CIDR.

[7]  Jens Dittrich,et al.  The Case for Automatic Database Administration using Deep Reinforcement Learning , 2018, ArXiv.

[8]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[9]  Robert H. Sloan,et al.  Reinforcement Learning and Function Approximation , 2005, FLAIRS.

[10]  Liwen Sun,et al.  Fine-grained partitioning for aggressive data skipping , 2014, SIGMOD Conference.

[11]  Anastasia Ailamaki,et al.  AutoPart: automating schema design for large scientific databases using data partitioning , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[12]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[13]  Alekh Jindal,et al.  A Comparison of Knives for Bread Slicing , 2013, Proc. VLDB Endow..

[14]  Stéphane Bressan,et al.  Regularized Cost-Model Oblivious Database Tuning with Reinforcement Learning , 2016, Trans. Large Scale Data Knowl. Centered Syst..

[15]  Shrainik Jain,et al.  Query2Vec: NLP Meets Databases for Generalized Workload Analytics , 2018, ArXiv.

[16]  Anastasia Ailamaki,et al.  Automated physical designers: what you see is (not) what you get , 2012, DBTest '12.

[17]  Satyanarayana R. Valluri,et al.  Query Optimization in Oracle 12c Database In-Memory , 2015, Proc. VLDB Endow..

[18]  Gunter Saake,et al.  Are Databases Fit for Hybrid Workloads on GPUs? A Storage Engine's Perspective , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[19]  Lin Ma,et al.  Query-based Workload Forecasting for Self-Driving Database Management Systems , 2018, SIGMOD Conference.

[20]  Jorge-Arnulfo Quiané-Ruiz,et al.  Trojan data layouts: right shoes for a running elephant , 2011, SoCC.

[21]  Philipp Rösch,et al.  A Storage Advisor for Hybrid-Store Databases , 2012, Proc. VLDB Endow..

[22]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[25]  Paul Garrett Naive set theory , 2007 .

[26]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[27]  Andrew Pavlo,et al.  Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads , 2016, SIGMOD Conference.

[28]  Shamkant B. Navathe,et al.  Vertical partitioning algorithms for database design , 1984, TODS.

[29]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.