Generating Memoryless Policies Faster Using Automatic Temporal Abstractions for Reinforcement Learning with Hidden State
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[3] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[5] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[7] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[8] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[9] Takeshi Yoshikawa,et al. An Acquiring Method of Macro-Actions in Reinforcement Learning , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.
[10] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[11] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[12] Mark D. Pendrith,et al. An Analysis of Direct Reinforcement Learning in Non-Markovian Domains , 1998, ICML.
[13] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[14] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[15] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[16] Takashi Komeda,et al. Solving POMDPs with Automatic Discovery of Subgoals , 2009 .
[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[18] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[19] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[20] Carla E. Brodley,et al. Proceedings of the twenty-first international conference on Machine learning , 2004, International Conference on Machine Learning.
[21] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.
[22] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[23] Amy McGovern,et al. AcQuire-macros: An Algorithm for Automatically Learning Macro-actions , 1998 .
[24] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[25] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.
[26] Paul A. Crook. Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds , 2007 .
[27] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[28] Reda Alhajj,et al. Improving reinforcement learning by using sequence trees , 2010, Machine Learning.
[29] Faruk Polat,et al. Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.
[30] Groupe Pdmia. Markov Decision Processes In Artificial Intelligence , 2009 .
[31] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[32] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[33] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.