Action Selection Methods in a Robotic Reinforcement Learning Scenario
暂无分享,去创建一个
[1] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .
[2] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[4] Oliver Lemon,et al. Reinforcement Learning for Adaptive Dialogue Systems - ReadingSample , 2017 .
[5] Stefan Wermter,et al. Training Agents With Interactive Reinforcement Learning and Contextual Affordances , 2016, IEEE Transactions on Cognitive and Developmental Systems.
[6] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[7] Y. Niv. Reinforcement learning in the brain , 2009 .
[8] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[9] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[10] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[11] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.
[12] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[13] Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.
[14] Michel Tokic,et al. Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference , 2010, KI.
[15] Stephen R. Marsland,et al. Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.
[16] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.