论文信息 - Omega-Regular Objectives in Model-Free Reinforcement Learning

Omega-Regular Objectives in Model-Free Reinforcement Learning

We provide the first solution for model-free reinforcement learning of \(\omega \)-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of \(\omega \)-regular objectives to an almost-sure reachability problem, and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. We compile \(\omega \)-regular properties into limit-deterministic Buchi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Moshe Y. Vardi. Automatic verification of probabilistic concurrent finite state programs , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[3] Calin Belta,et al. Temporal Logic Motion Planning and Control With Probabilistic Satisfaction Guarantees , 2012, IEEE Transactions on Robotics.

[4] Tsuyoshi Murata,et al. {m , 1934, ACML.

[5] Wolfgang Thomas,et al. Automata on Infinite Objects , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[8] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[9] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[10] Jan Kretínský,et al. Limit-Deterministic Büchi Automata for Linear Temporal Logic , 2016, CAV.

[11] Christel Baier,et al. Principles of model checking , 2008 .

[12] Jan Kretínský,et al. MoChiBA: Probabilistic LTL Model Checking Using Limit-Deterministic Büchi Automata , 2016, ATVA.

[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14] Olivier Carton,et al. Computing the Rabin Index of a Parity Automaton , 1999, RAIRO Theor. Informatics Appl..

[15] Zohar Manna,et al. Formal verification of probabilistic systems , 1997 .

[16] Toshimitsu Ushio,et al. Learning an Optimal Control Policy for a Markov Decision Process Under Linear Temporal Logic Specifications , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[17] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[18] Tom Eccles,et al. An investigation of model-free planning , 2019, ICML.

[19] Lijun Zhang,et al. Lazy Probabilistic Model Checking without Determinisation , 2013, CONCUR.

[20] W. Marsden. I and J , 2012 .

[21] Zohar Manna,et al. The Temporal Logic of Reactive and Concurrent Systems , 1991, Springer New York.

[22] Jean-Eric Pin,et al. Infinite words - automata, semigroups, logic and games , 2004, Pure and applied mathematics series.

[23] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[24] Jan Kretínský,et al. The Hanoi Omega-Automata Format , 2015, CAV.

[25] Krishnendu Chatterjee,et al. Automata with Generalized Rabin Pairs for Probabilistic Model Checking and LTL Synthesis , 2013, CAV.

[26] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[27] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .

[28] A. Shwartz,et al. Handbook of Markov decision processes : methods and applications , 2002 .

[29] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[31] S. Shankar Sastry,et al. A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[32] Daniel Kroening,et al. Logically-Correct Reinforcement Learning , 2018, ArXiv.

[33] Daniel Kroening,et al. Certified Reinforcement Learning with Logic Guidance , 2019, Artif. Intell..

[34] 太田純,et al. Old Possum's Book of Practical Catsとライト・ヴァースの伝統 , 1998 .

[35] Krishnendu Chatterjee,et al. Verification of Markov Decision Processes Using Learning Algorithms , 2014, ATVA.

[36] Robert K. Brayton,et al. The Rabin Index and Chain Automata, with Applications to Automatas and Games , 1995, CAV.

[37] Eugene A. Feinberg,et al. Handbook of Markov Decision Processes , 2002 .

[38] Mihalis Yannakakis,et al. The complexity of probabilistic verification , 1995, JACM.

[39] Amir Pnueli,et al. Verification of multiprocess probabilistic protocols , 2005, Distributed Computing.

[40] Marta Z. Kwiatkowska,et al. PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.