Automated Vertical Partitioning with Deep Reinforcement Learning

Finding the right vertical partitioning scheme to match a workload is one of the essential database optimization problems. With the proper partitioning, queries and management tasks can skip unnecessary data, improving their performance. Algorithmic approaches are common for determining a partitioning scheme, with solutions being shaped by their choice of cost models and pruning heuristics. In spite of their advantages, these can be inefficient since they don’t improve with experience (e.g., learning from errors in cost estimates or heuristics employed). In this paper we consider the feasibility of a general machine learning solution to overcome such drawbacks. Specifically, we extend the work in GridFormation, mapping the partitioning task to a reinforcement learning (RL) task. We validate our proposal experimentally using a TPC-H database and workload, HDD cost models and the Google Dopamine framework for deep RL. We report early evaluations using 3 standard DQN agents, establishing that agents can match the results of state-of-the-art algorithms. We find that convergence is easily achievable for single table-workload pairs, but that generalizing to random workloads requires further work. We also report competitive runtimes for our agents on both GPU and CPU inference, outperforming some state-of-the-art algorithms, as the number of attributes in a table increases.