Meta-Learning of Exploration and Exploitation Parameters with Replacing Eligibility Traces
暂无分享,去创建一个
[1] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.
[2] Neil D. Lawrence,et al. Missing Data in Kernel PCA , 2006, ECML.
[3] Andreea C. Bostan,et al. The basal ganglia communicate with the cerebellum , 2010, Proceedings of the National Academy of Sciences.
[4] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[5] Michel Tokic,et al. Teaching Reinforcement Learning using a Physical Robot , 2012 .
[6] Y. Niv. Reinforcement learning in the brain , 2009 .
[7] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[8] Friedhelm Schwenker,et al. Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts , 2010, 2010 20th International Conference on Pattern Recognition.
[9] Kunikazu Kobayashi,et al. A Meta-learning Method Based on Temporal Difference Error , 2009, ICONIP.
[10] Martin A. Riedmiller,et al. Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.
[11] Günther Palm,et al. Adaptive Exploration Using Stochastic Neurons , 2012, ICANN.
[12] Michel Tokic,et al. Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference , 2010, KI.
[13] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.
[14] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[15] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Günther Palm,et al. Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants , 2012, ANNPR.
[18] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.
[19] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[20] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[21] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[22] Günther Palm,et al. Robust Exploration/Exploitation Trade-Offs in Safety-Critical Applications , 2012 .
[23] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[24] Friedhelm Schwenker,et al. Neural Approximation of Monte Carlo Policy Evaluation Deployed in Connect Four , 2008, ANNPR.
[25] Kenji Doya,et al. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.
[26] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[27] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[28] P. Dayan,et al. Choice values , 2006, Nature Neuroscience.