暂无分享,去创建一个
Shane Legg | Alex Graves | Demis Hassabis | Olivier Pietquin | Meire Fortunato | Charles Blundell | Ian Osband | Jacob Menick | Bilal Piot | Mohammad Gheshlaghi Azar | Vlad Mnih | Remi Munos | D. Hassabis | R. Munos | S. Legg | A. Graves | Ian Osband | C. Blundell | Bilal Piot | M. G. Azar | Meire Fortunato | Jacob Menick | V. Mnih | O. Pietquin | Alex Graves
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] Robert E. Kalaba,et al. Dynamic Programming and Modern Control Theory , 1966 .
[3] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] Christopher M. Bishop,et al. Training with Noise is Equivalent to Tikhonov Regularization , 1995, Neural Computation.
[7] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[8] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[11] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[15] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[16] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[17] O. Pietquin,et al. Managing Uncertainty within Value Function Approximation in Reinforcement Learning , 2010 .
[18] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[19] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..
[20] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.
[21] Jérémy Fix,et al. Monte-Carlo Swarm Policy Search , 2012, ICAISC.
[22] Tor Lattimore,et al. The Sample-Complexity of General Reinforcement Learning , 2013, ICML.
[23] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[24] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[25] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[28] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[29] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[30] Hossein Mobahi,et al. Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.
[31] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[32] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[33] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[34] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[35] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[36] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[37] Shai Shalev-Shwartz,et al. On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.
[38] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[39] Jianfeng Gao,et al. Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking , 2016, ArXiv.
[40] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[41] Oriol Vinyals,et al. Bayesian Recurrent Neural Networks , 2017, ArXiv.
[42] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[43] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[44] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[45] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[46] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[47] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..