Randomized Prior Functions for Deep Reinforcement Learning
暂无分享,去创建一个
Albin Cassirer | Ian Osband | John Aslanides | Ian Osband | J. Aslanides | Albin Cassirer | John Aslanides
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] B. D. Finetti. La prévision : ses lois logiques, ses sources subjectives , 1937 .
[3] Abraham Wald,et al. Statistical Decision Functions , 1951 .
[4] David Roxbee Cox,et al. Problems and solutions in theoretical statistics , 1978 .
[5] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[6] B. Efron. The jackknife, the bootstrap, and other resampling plans , 1987 .
[7] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[8] S. T. Buckland,et al. An Introduction to the Bootstrap. , 1994 .
[9] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[10] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[11] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[14] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.
[15] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Tadayoshi Fushiki. Bootstrap prediction and Bayesian prediction under misspecified models , 2005 .
[18] Tadayoshi Fushiki,et al. Nonparametric bootstrap prediction , 2005 .
[19] Shane Legg,et al. A Collection of Definitions of Intelligence , 2007, AGI.
[20] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[21] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[22] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[23] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[24] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[25] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[26] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[27] Ian. ( More ) Efficient Reinforcement Learning via Posterior Sampling , 2013 .
[28] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[29] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[30] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[31] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[32] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[33] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[34] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[35] Benjamin Van Roy,et al. Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.
[36] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[39] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[40] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[41] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[42] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[43] Laurent Orseau,et al. Thompson Sampling is Asymptotically Optimal in General Environments , 2016, UAI.
[44] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[45] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[46] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[47] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[48] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[49] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[50] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[51] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[52] Ian Osband,et al. Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout , 2016 .
[53] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[54] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[55] Jianfeng Gao,et al. Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking , 2016, ArXiv.
[56] Alex Kendall,et al. Concrete Dropout , 2017, NIPS.
[57] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[58] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[59] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.
[60] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[61] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[62] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[63] Pieter Abbeel,et al. UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles , 2017, ArXiv.
[64] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[65] Yunhao Tang,et al. Variational Deep Q Network , 2017, ArXiv.
[66] Sylvain Gelly,et al. Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.
[67] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[68] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[69] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[70] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[71] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[72] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[73] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[74] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[75] Joelle Pineau,et al. Randomized Value Functions via Multiplicative Normalizing Flows , 2018, UAI.
[76] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..