论文信息 - Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning

Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning

In this extended abstract, we propose a new technique for query scheduling with the explicit goal of reducing disk reads and thus implicitly increasing query performance. We introduce \system, a learned scheduler that leverages overlapping data reads among incoming queries and learns a scheduling strategy that improves cache hits. \system relies on deep reinforcement learning to produce workload-specific scheduling strategies that focus on long-term performance benefits while being adaptive to previously-unseen data access patterns. We present results from a proof-of-concept prototype, demonstrating that learned schedulers can offer significant performance improvements over hand-crafted scheduling heuristics. Ultimately, we make the case that this is a promising research direction in the intersection of machine learning and databases.

Olga Papaemmanouil | Chi Zhang | Ryan Marcus | Anat Kleiman

[1] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[2] Immanuel Trummer,et al. SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning , 2018, Proc. VLDB Endow..

[3] Jignesh M. Patel,et al. Towards Multi-Tenant Performance SLOs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[4] Andreas Kipf,et al. Learned Cardinalities: Estimating Correlated Joins with Deep Learning , 2018, CIDR.

[5] Nikhil R. Devanur,et al. Cloud scheduling with setup cost , 2013, SPAA.

[6] Yun Chi,et al. iCBS: Incremental Costbased Scheduling under Piecewise Linear SLAs , 2011, Proc. VLDB Endow..

[7] Calton Pu,et al. ActiveSLA: a profit-oriented admission control framework for database-as-a-service providers , 2011, SoCC.

[8] Olga Papaemmanouil,et al. WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases , 2016, Proc. VLDB Endow..

[9] Geoffrey J. Gordon,et al. Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[10] Hamid Pirahesh,et al. Cache Tables: Paving the Way for an Adaptive Database Cache , 2003, VLDB.

[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[12] Yun Chi,et al. SLA-tree: a framework for efficiently supporting SLA-based decisions in cloud computing , 2011, EDBT/ICDT '11.

[13] Tim Kraska,et al. Cost-Guided Cardinality Estimation: Focus Where it Matters , 2020, 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW).

[14] Tim Kraska,et al. Neo: A Learned Query Optimizer , 2019, Proc. VLDB Endow..

[15] François Chollet,et al. Keras: The Python Deep Learning library , 2018 .

[16] Hongzi Mao,et al. Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[17] Yann LeCun,et al. Generalization and network design strategies , 1989 .

[18] Shahram Ghandeharizadeh,et al. Cache augmented database management systems , 2013, DBSocial '13.

[19] Tim Brecht,et al. Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[20] Yun Chi,et al. PMAX: tenant placement in multitenant databases for profit maximization , 2013, EDBT '13.

[21] Joseph L. Hellerstein,et al. SLAOrchestrator: Reducing the Cost of Performance SLAs for Cloud Data Analytics , 2018, USENIX Annual Technical Conference.

[22] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[23] Olga Papaemmanouil,et al. Releasing Cloud Databases for the Chains of Performance Prediction Models , 2017, CIDR.

[24] Magdalena Balazinska,et al. Learning State Representations for Query Optimization with Deep Reinforcement Learning , 2018, DEEM@SIGMOD.

[25] Magdalena Balazinska,et al. PerfEnforce Demonstration: Data Analytics with Performance Guarantees , 2016, SIGMOD Conference.

[26] Michael Stonebraker,et al. STeP: Scalable Tenant Placement for Managing Database-as-a-Service Deployments , 2016, SoCC.

[27] Raghunath Othayoth Nambiar,et al. The making of TPC-DS , 2006, VLDB.

[28] Raul Castro Fernandez,et al. Termite: a system for tunneling through heterogeneous data , 2019, aiDM@SIGMOD.

[29] Theodoros Rekatsinas,et al. Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[30] Badrish Chandramouli,et al. A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service , 2013, SIGMOD '13.

[31] Guoliang Li,et al. An End-to-End Learning-based Cost Estimator , 2019, Proc. VLDB Endow..

[32] Borja Sotomayor,et al. Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[33] Kenneth A. Ross,et al. Buffering databse operations for enhanced instruction cache performance , 2004, SIGMOD '04.

[34] Tom M. Mitchell,et al. The Need for Biases in Learning Generalizations , 2007 .

[35] Julien Gossa,et al. Cost-Wait Trade-Offs in Client-Side Resource Provisioning with Elastic Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[36] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[37] Schahram Dustdar,et al. Cost-Efficient and Application SLA-Aware Client Side Request Scheduling in an Infrastructure-as-a-Service Cloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[38] Armando Solar-Lezama,et al. The three pillars of machine programming , 2018, MAPL@PLDI.

[39] Olga Papaemmanouil,et al. NashDB: An End-to-End Economic Method for Elastic Database Fragmentation, Replication, and Provisioning , 2018, SIGMOD Conference.

[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41] Shrainik Jain,et al. Database-Agnostic Workload Management , 2018, CIDR.

[42] Calton Pu,et al. Intelligent management of virtualized resources for database systems in cloud environment , 2011, 2011 IEEE 27th International Conference on Data Engineering.