暂无分享,去创建一个
[1] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[2] Laurent Orseau,et al. Safely Interruptible Agents , 2016, UAI.
[3] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[5] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[6] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[7] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .
[8] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[9] Owain Evans,et al. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.
[10] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[11] Ronald A. Howard,et al. Influence Diagrams , 2005, Decis. Anal..
[12] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] John Salvatier,et al. Agent-Agnostic Human-in-the-Loop Reinforcement Learning , 2017, ArXiv.
[15] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[16] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[17] Maximilian Lam,et al. Quantized Reinforcement Learning (QUARL) , 2019, ArXiv.
[18] C. Robert. Superintelligence: Paths, Dangers, Strategies , 2017 .
[19] Christian Igel,et al. Uncertainty handling CMA-ES for reinforcement learning , 2009, GECCO.
[20] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[21] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[22] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[23] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[24] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[25] Stephen M. Omohundro,et al. The Basic AI Drives , 2008, AGI.
[26] Shane Legg,et al. Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings , 2019, ArXiv.
[27] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.