Value Preserving State-Action Abstractions
暂无分享,去创建一个
Doina Precup | Khimya Khetarpal | Michael L. Littman | Dilip Arumugam | David Abel | Nathan Umbanhowar | Doina Precup | M. Littman | Khimya Khetarpal | David Abel | Dilip Arumugam | Nathan Umbanhowar
[1] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[2] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[3] Romain Laroche,et al. On Value Function Representation of Long Horizon Problems , 2018, AAAI.
[4] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.
[5] Philip S. Thomas,et al. Natural Option Critic , 2019, AAAI.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Parag Singla,et al. A Novel Abstraction Framework for Online Planning: Extended Abstract , 2015, AAMAS.
[8] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[9] Stuart J. Russell,et al. Markovian State and Action Abstractions for MDPs via Hierarchical MCTS , 2016, IJCAI.
[10] Phuong Nguyen,et al. Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning , 2013, ICML.
[11] Balaraman Ravindran,et al. Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.
[12] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[13] R. Bellman. A Markovian Decision Process , 1957 .
[14] Andrew G. Barto,et al. Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.
[15] Thomas G. Dietterich. State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.
[16] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.
[17] Shie Mannor,et al. Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..
[18] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[19] Doina Precup,et al. Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics , 2011, EWRL.
[20] Alessandro Lazaric,et al. Regret Minimization in MDPs with Options without Prior Knowledge , 2017, NIPS.
[21] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[22] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..
[23] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.
[24] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[25] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[26] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.
[27] George Konidaris,et al. Constructing Abstraction Hierarchies Using a Skill-Symbol Loop , 2015, IJCAI.
[28] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[29] George Konidaris,et al. Discovering Options for Exploration by Minimizing Cover Time , 2019, ICML.
[30] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.
[31] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[32] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[33] Alessandro Lazaric,et al. Regret Bounds for Learning State Representations in Reinforcement Learning , 2019, NeurIPS.
[34] Lawson L. S. Wong,et al. State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.
[35] Doina Precup,et al. Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.
[36] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[37] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[38] Balaraman Ravindran,et al. SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.
[39] Marie desJardins,et al. Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes , 2015, IJCAI.
[40] Kim G. Larsen,et al. Bisimulation through Probabilistic Testing , 1991, Inf. Comput..
[41] B. Fox. Discretizing dynamic programs , 1973 .
[42] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[43] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[44] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[45] Balaraman Ravindran,et al. Relativized Options: Choosing the Right Transformation , 2003, ICML.
[46] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[47] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[48] Stuart J. Russell,et al. Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions , 2017, IJCAI.
[49] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[50] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[51] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[52] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[53] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[54] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[55] Ward Whitt,et al. Approximations of Dynamic Programs, II , 1979, Math. Oper. Res..
[56] Doina Precup,et al. The Termination Critic , 2019, AISTATS.
[57] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[58] Peter Stone,et al. State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.
[59] Nan Jiang,et al. Improving UCT planning via approximate homomorphisms , 2014, AAMAS.
[60] Christopher Grimm,et al. Mitigating Planner Overfitting in Model-Based Reinforcement Learning , 2018, ArXiv.
[61] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.
[62] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[63] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[64] Leslie Pack Kaelbling,et al. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..
[65] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[66] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..
[67] Alessandro Lazaric,et al. Exploration – Exploitation in MDPs with Options , 2016 .
[68] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..
[69] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[70] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.
[71] Marcus Hutter,et al. Extreme State Aggregation beyond MDPs , 2014, ALT.
[72] David Silver,et al. Value Iteration with Options and State Aggregation , 2015, ArXiv.
[73] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.
[74] Marcus Hutter,et al. Performance Guarantees for Homomorphisms Beyond Markov Decision Processes , 2019, AAAI.