Environment-Independent Task Specifications via GLTL

We propose a new task-specification language for Markov decision processes that is designed to be an improvement over reward functions by being environment independent. The language is a variant of Linear Temporal Logic (LTL) that is extended to probabilistic specifications in a way that permits approximations to be learned in finite time. We provide several small environments that demonstrate the advantages of our geometric LTL (GLTL) language and illustrate how it can be used to specify standard reinforcement-learning tasks straightforwardly.

[1]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[2]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[3]  Ufuk Topcu,et al.  Synthesis of Reactive Switching Protocols From Temporal Logic Specifications , 2013, IEEE Transactions on Automatic Control.

[4]  Zohar Manna,et al.  Formal verification of probabilistic systems , 1997 .

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[7]  U. Topcu,et al.  Correct , Reactive Robot Control from Abstraction and Temporal Logic Specifications , 2011 .

[8]  Calin Belta,et al.  LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees , 2011, ArXiv.

[9]  Ufuk Topcu,et al.  Receding Horizon Temporal Logic Planning , 2012, IEEE Transactions on Automatic Control.

[10]  Orna Kupferman,et al.  Discounting in LTL , 2014, TACAS.

[11]  Christel Baier,et al.  Principles of model checking , 2008 .

[12]  Thomas A. Henzinger,et al.  Discounting the Future in Systems Theory , 2003, ICALP.

[13]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML.

[14]  Hadas Kress-Gazit,et al.  Temporal-Logic-Based Reactive Mission and Motion Planning , 2009, IEEE Transactions on Robotics.

[15]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[16]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[17]  Craig Boutilier,et al.  Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.

[18]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[19]  Thomas A. Henzinger,et al.  Model checking discounted temporal properties , 2005, Theor. Comput. Sci..

[20]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[21]  C. Belta,et al.  Control of Markov decision processes from PCTL specifications , 2011, Proceedings of the 2011 American Control Conference.

[22]  Ufuk Topcu,et al.  Robust control of uncertain Markov Decision Processes with temporal logic specifications , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[23]  Zohar Manna,et al.  The Temporal Logic of Reactive and Concurrent Systems , 1991, Springer New York.

[24]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[26]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[27]  Andrew W. Moore,et al.  Variable Resolution Dynamic Programming , 1991, ML Workshop.

[28]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[29]  Marta Z. Kwiatkowska,et al.  PRISM: Probabilistic Symbolic Model Checker , 2002, Computer Performance Evaluation / TOOLS.