暂无分享,去创建一个
[1] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[2] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[3] Matthew E. Taylor,et al. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning , 2017, ArXiv.
[4] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[5] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[6] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[7] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[8] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.
[9] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[10] Katja Hofmann,et al. The Atari Grand Challenge Dataset , 2017, ArXiv.
[11] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[14] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[15] Sonia Chernova,et al. Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.
[16] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[17] Jeffrey S. Racine,et al. Consistent cross-validatory model-selection for dependent data: hv-block cross-validation , 2000 .
[18] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[19] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[20] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[21] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[22] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[24] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[25] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[26] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[27] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[28] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[29] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[30] Yiming Zhang,et al. Supervised Policy Update for Deep Reinforcement Learning , 2018, ICLR.
[31] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[32] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[33] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[34] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[35] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[36] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[37] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[38] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[39] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[41] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[42] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[43] Gongping Yang,et al. On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.
[44] Shie Mannor,et al. Scaling Up Robust MDPs by Reinforcement Learning , 2013, ArXiv.