Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning
暂无分享,去创建一个
[1] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[2] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[3] R. Bellman. Dynamic programming. , 1957, Science.
[4] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[6] Thomas Hofmann,et al. Automated Hierarchy Discovery for Planning in Partially Observable Environments , 2007 .
[7] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[8] Ron Sun,et al. Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors , 2000, IEEE Trans. Syst. Man Cybern. Part B.
[9] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.
[10] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[11] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[12] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[13] Amy McGovern,et al. AcQuire-macros: An Algorithm for Automatically Learning Macro-actions , 1998 .
[14] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[15] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..
[16] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.
[17] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[18] Reda Alhajj,et al. Improving reinforcement learning by using sequence trees , 2010, Machine Learning.
[19] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[20] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[21] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.
[22] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[23] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[24] R. Andrew McCallum,et al. Hidden state and reinforcement learning with instance-based state identification , 1996, IEEE Trans. Syst. Man Cybern. Part B.
[25] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[26] Geoffrey J. Gordon,et al. Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..
[27] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] Joelle Pineau,et al. Tractable planning under uncertainty: exploiting structure , 2004 .
[30] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[31] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[32] T. Komeda,et al. Reinforcement learning in non-markovian environments using automatic discovery of subgoals , 2007, SICE Annual Conference 2007.
[33] Reda Alhajj,et al. Positive Impact of State Similarity on Reinforcement Learning Performance , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[34] Faruk Polat,et al. Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.
[35] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[36] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[37] Leslie Pack Kaelbling,et al. Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.