Safe Learning for Near Optimal Scheduling

In this paper, we investigate the combination of synthesis techniques and learning techniques to obtain safe and near optimal schedulers for a preemptible task scheduling problem. We study both model-based learning techniques with PAC guarantees and model-free learning techniques based on shielded deep Q-learning. The new learning algorithms have been implemented to conduct experimental evaluations.

[1]  Jan Kretínský,et al.  Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints , 2018, CONCUR.

[2]  P. Ramadge,et al.  Supervisory control of a class of discrete event processes , 1987 .

[3]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[4]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[5]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[6]  Gilles Geeraerts,et al.  Monte Carlo Tree Search guided by Symbolic Advice for MDPs , 2020, CONCUR.

[7]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[8]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[9]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[10]  Thomas A. Henzinger,et al.  Faster Statistical Model Checking for Unbounded Temporal Properties , 2015, TACAS.

[11]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Giorgio C. Buttazzo,et al.  HARD REAL-TIME COMPUTING SYSTEMS Predictable Scheduling Algorithms and Applications , 2007 .

[14]  Jean-François Raskin,et al.  Safe and Optimal Scheduling for Hard and Soft Tasks , 2018, FSTTCS.

[15]  Krishnendu Chatterjee,et al.  Optimizing Expectation with Guarantees in POMDPs , 2017, AAAI.

[16]  Sebastian Junges,et al.  A Storm is Coming: A Modern Probabilistic Model Checker , 2017, CAV.

[17]  A. Neyman,et al.  Stochastic games , 1981 .

[18]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[19]  Krishnendu Chatterjee,et al.  Run-Time Optimization for Learned Controllers Through Quantitative Games , 2019, CAV.

[20]  Wolfgang Thomas,et al.  On the Synthesis of Strategies in Infinite Games , 1995, STACS.

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[23]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[24]  Krishnendu Chatterjee,et al.  Robustness of Structurally Equivalent Concurrent Parity Games , 2011, FoSSaCS.

[25]  Eilon Solan Continuity of the Value of Competitive Markov Decision Processes , 2003 .

[26]  Ufuk Topcu,et al.  Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.