Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications
暂无分享,去创建一个
Giuseppe De Giacomo | Marco Favorito | Luca Iocchi | Fabio Patrizi | L. Iocchi | F. Patrizi | Marco Favorito
[1] Scott Sanner,et al. Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping , 2021, SOCS.
[2] Dana Fisman,et al. Learning Regular Languages via Alternating Automata , 2015, IJCAI.
[3] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[4] Giuseppe De Giacomo,et al. Linear Temporal Logic and Linear Dynamic Logic on Finite Traces , 2013, IJCAI.
[5] Wil M. P. van der Aalst,et al. DECLARE: Full Support for Loosely-Structured Processes , 2007, 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007).
[6] John K. Slaney,et al. Semipositive LTL with an Uninterpreted Past Operator , 2005, Log. J. IGPL.
[7] Long Ji Lin,et al. Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..
[8] Frits W. Vaandrager,et al. Model learning , 2017, Commun. ACM.
[9] Fred Kröger,et al. Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.
[10] Nick Hawes,et al. Optimal and dynamic planning for Markov decision processes with co-safe LTL specifications , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[11] Craig Boutilier,et al. Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.
[12] Giuseppe De Giacomo,et al. Automata-Theoretic Foundations of FOND Planning for LTLf and LDLf Goals , 2018, IJCAI.
[13] Dana S. Scott,et al. Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..
[14] Andrew G. Barto,et al. An intrinsic reward mechanism for efficient exploration , 2006, ICML.
[15] Sheila A. McIlraith,et al. Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.
[16] John K. Slaney,et al. Decision-Theoretic Planning with non-Markovian Rewards , 2011, J. Artif. Intell. Res..
[17] Hector J. Levesque,et al. GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..
[18] Laurent Orseau,et al. Safely Interruptible Agents , 2016, UAI.
[19] Anca D. Dragan,et al. The Off-Switch Game , 2016, IJCAI.
[20] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.
[21] Sheila A. McIlraith,et al. Monitoring Plan Optimality During Execution , 2007, ICAPS.
[22] Stuart J. Russell,et al. Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Charles Gretton,et al. A More Expressive Behavioral Logic for Decision-Theoretic Planning , 2014, PRICAI.
[25] Charles Gretton. Gradient-Based Relational Reinforcement Learning of Temporally Extended Policies , 2007, ICAPS.
[26] Raymond Reiter,et al. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2001 .
[27] Dana Angluin,et al. Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..
[28] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..
[29] Orna Kupferman,et al. On High-Quality Synthesis , 2016, CSR.
[30] Alberto Camacho. Decision-Making with Non-Markovian Rewards: From LTL to automata-based reward shaping , 2017 .
[31] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[32] Ufuk Topcu,et al. Correct-by-synthesis reinforcement learning with temporal logic constraints , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[33] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[34] Ronen I. Brafman,et al. LTLf/LDLf Non-Markovian Rewards , 2018, AAAI.
[35] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.
[36] John G. Gibbons. Knowledge in Action , 2001 .
[37] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[38] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[39] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[40] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[41] Jorge A. Baier,et al. Beyond Classical Planning: Procedural Control Knowledge and Preferences in State-of-the-Art Planners , 2008, AAAI.
[42] Nick Hawes,et al. Optimal Policy Generation for Partially Satisfiable Co-Safe LTL Specifications , 2015, IJCAI.
[43] Ufuk Topcu,et al. Environment-Independent Task Specifications via GLTL , 2017, ArXiv.
[44] Orna Kupferman,et al. Formally Reasoning About Quality , 2016, J. ACM.