Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in such areas as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system's performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system's operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.

[1]  Yi Liu,et al.  An Efficient Bandit Algorithm for Realtime Multivariate Optimization , 2017, KDD.

[2]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[3]  John Langford,et al.  Residual Loss Prediction: Reinforcement Learning With No Incremental Feedback , 2018, ICLR.

[4]  James Andrew Bagnell,et al.  Learning in modular systems , 2010 .

[5]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[6]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[7]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Eric Horvitz,et al.  Perception, Attention, and Resources: A Decision-Theoretic Approach to Graphics Rendering , 1997, UAI.

[10]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[11]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[12]  John Langford,et al.  A Contextual Bandit Bake-off , 2018, J. Mach. Learn. Res..

[13]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[14]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[15]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[16]  Jim Gao,et al.  Machine Learning Applications for Data Center Optimization , 2014 .

[17]  Deborah Hanus,et al.  Smart scheduling : optimizing Tilera's process scheduling via reinforcement learning , 2013 .

[18]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[20]  Thorsten Joachims,et al.  Beyond myopic inference in big data pipelines , 2013, KDD.

[21]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Mehmet Demirci,et al.  A Survey of Machine Learning Applications for Energy-Efficient Resource Management in Cloud Computing Environments , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[24]  José Antonio Lozano,et al.  A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments , 2014, Journal of Grid Computing.

[25]  Eric Horvitz,et al.  Principles and applications of continual computation , 2001, Artif. Intell..

[26]  Ivona Brandic,et al.  Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review , 2018, Computing.

[27]  Alexandra Fedorova,et al.  Operating System Scheduling On Heterogeneous Core Systems , 2007 .

[28]  Cheng-Zhong Xu,et al.  URL: A unified reinforcement learning approach for autonomic cloud management , 2012, J. Parallel Distributed Comput..

[29]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[30]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.