Robust Exploration/Exploitation Trade-Offs in Safety-Critical Applications
暂无分享,去创建一个
[1] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[2] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.
[3] Raja Chatila,et al. On Fault Tolerance and Robustness in Autonomous Systems , 2004 .
[4] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.
[5] T. Smithers. Autonomy in Robots and Other Agents , 1997, Brain and Cognition.
[6] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..
[7] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[8] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.
[9] Peter Geibel,et al. Reinforcement Learning with Bounded Risk , 2001, ICML.
[10] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .
[11] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[12] Dirk Söffker,et al. On Risk Formalization of On-Line Risk Assessment for Safe Decision Making in Robotics , 2010 .
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Steffen Udluft,et al. Safe exploration for reinforcement learning , 2008, ESANN.
[15] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[16] H. Jin Kim,et al. Stable adaptive control with online learning , 2004, NIPS.
[17] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[18] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .