论文信息 - Safe Learning for Near Optimal Scheduling - 字舞流文

Safe Learning for Near Optimal Scheduling

In this paper, we investigate the combination of synthesis techniques and learning techniques to obtain safe and near optimal schedulers for a preemptible task scheduling problem. We study both model-based learning techniques with PAC guarantees and model-free learning techniques based on shielded deep Q-learning. The new learning algorithms have been implemented to conduct experimental evaluations.

Gilles Geeraerts | Shibashis Guha | Jean-Franccois Raskin | Guillermo A. P'erez

[1] Jan Kretínský,et al. Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints , 2018, CONCUR.

[2] P. Ramadge,et al. Supervisory control of a class of discrete event processes , 1987 .

[3] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[5] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[6] Gilles Geeraerts,et al. Monte Carlo Tree Search guided by Symbolic Advice for MDPs , 2020, CONCUR.

[7] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[8] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[9] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.

[10] Thomas A. Henzinger,et al. Faster Statistical Model Checking for Unbounded Temporal Properties , 2015, TACAS.

[11] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13] Giorgio C. Buttazzo,et al. HARD REAL-TIME COMPUTING SYSTEMS Predictable Scheduling Algorithms and Applications , 2007 .

[14] Jean-François Raskin,et al. Safe and Optimal Scheduling for Hard and Soft Tasks , 2018, FSTTCS.

[15] Krishnendu Chatterjee,et al. Optimizing Expectation with Guarantees in POMDPs , 2017, AAAI.

[16] Sebastian Junges,et al. A Storm is Coming: A Modern Probabilistic Model Checker , 2017, CAV.

[17] A. Neyman,et al. Stochastic games , 1981 .

[18] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[19] Krishnendu Chatterjee,et al. Run-Time Optimization for Learned Controllers Through Quantitative Games , 2019, CAV.

[20] Wolfgang Thomas,et al. On the Synthesis of Strategies in Infinite Games , 1995, STACS.

[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[22] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[23] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[24] Krishnendu Chatterjee,et al. Robustness of Structurally Equivalent Concurrent Parity Games , 2011, FoSSaCS.

[25] Eilon Solan. Continuity of the Value of Competitive Markov Decision Processes , 2003 .

[26] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.