Joint Inference of Reward Machines and Policies for Reinforcement Learning
暂无分享,去创建一个
Ufuk Topcu | Rupak Majumdar | Zhe Xu | Daniel Neider | Bo Wu | Ivan Gavran | Yousef Ahmad
[1] Sheila A. McIlraith,et al. Learning Reward Machines for Partially Observable Reinforcement Learning , 2019, NeurIPS.
[2] Christof Löding,et al. Abstract Learning Frameworks for Synthesis , 2015, TACAS.
[3] Marijn J. H. Heule,et al. Exact DFA Identification Using SAT Solvers , 2010, ICGI.
[4] Markus Wulfmeier,et al. Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.
[5] Alberto Camacho,et al. LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning , 2019, IJCAI.
[6] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[7] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.
[8] Sheila A. McIlraith,et al. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.
[9] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[10] Ufuk Topcu,et al. Transfer of Temporal Logic Formulas in Reinforcement Learning , 2019, IJCAI.
[11] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[12] Nils Jansen,et al. Regular Model Checking Using Solver Technologies and Automata Learning , 2013, NASA Formal Methods.
[13] J. Oncina,et al. INFERRING REGULAR LANGUAGES IN POLYNOMIAL UPDATED TIME , 1992 .
[14] Benedikt Bollig,et al. libalf: The Automata Learning Framework , 2010, CAV.
[15] Calin Belta,et al. Q-Learning for robust satisfaction of signal temporal logic specifications , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).
[16] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[17] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[18] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Ufuk Topcu,et al. Learning from Demonstrations with High-Level Side Information , 2017, IJCAI.
[21] Prashant Doshi,et al. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress , 2018, Artif. Intell..
[22] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[23] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.
[24] Peter Stone,et al. Cross-domain transfer for reinforcement learning , 2007, ICML '07.
[25] E. Mark Gold,et al. Complexity of Automaton Identification from Given Data , 1978, Inf. Control..
[26] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.
[27] Sheila A. McIlraith,et al. Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.
[28] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.
[29] Daniel Neider,et al. Applications of automata learning in verification and synthesis , 2014 .
[30] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .
[31] Dana Angluin,et al. Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..
[32] Jan Peters,et al. Regularizing Reinforcement Learning with State Abstraction , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[33] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[34] Jeffrey Shallit,et al. A Second Course in Formal Languages and Automata Theory , 2008 .
[35] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.