Making Sense of Reinforcement Learning and Probabilistic Inference
暂无分享,去创建一个
[1] Brendan O'Donoghue,et al. Variational Bayesian Reinforcement Learning with Regret Bounds , 2018, NeurIPS.
[2] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[3] Matthew Fellows,et al. VIREL: A Variational Inference Framework for Reinforcement Learning , 2018, NeurIPS.
[4] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[5] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[6] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[7] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[8] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[9] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[10] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[11] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[12] Claudio Gentile,et al. Boltzmann Exploration Done Right , 2017, NIPS.
[13] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[14] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[15] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[16] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[17] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[19] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[20] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[21] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[22] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[25] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[26] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[27] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[28] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[29] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[30] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[31] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[32] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[33] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[34] David Barber,et al. Variational methods for Reinforcement Learning , 2010, AISTATS.
[35] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[36] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.
[37] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[38] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[39] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[40] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[41] Peter W. Glynn,et al. Stochastic Simulation: Algorithms and Analysis , 2007 .
[42] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[43] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[44] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[45] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[46] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[47] J. L. Roux. An Introduction to the Kalman Filter , 2003 .
[48] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[49] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[50] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[51] C. Watkins. Learning from delayed rewards , 1989 .
[52] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[53] Abraham Wald,et al. Statistical Decision Functions , 1951 .
[54] R. Fisher. The Advanced Theory of Statistics , 1943, Nature.
[55] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .