Reinforcement Learning with Deep Energy-Based Policies
暂无分享,去创建一个
Sergey Levine | Pieter Abbeel | Haoran Tang | Tuomas Haarnoja | S. Levine | P. Abbeel | Tuomas Haarnoja | Haoran Tang
[1] R. Mazo. On the theory of brownian motion , 1973 .
[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[4] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[5] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[6] Geoffrey E. Hinton,et al. Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..
[7] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[8] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[9] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[10] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[11] Frédo Durand,et al. Linear Bellman combination for control of character animation , 2009, ACM Trans. Graph..
[12] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[13] Emanuel Todorov,et al. Compositionality of optimal control laws , 2009, NIPS.
[14] Kenji Doya,et al. Free-Energy Based Reinforcement Learning for Vision-Based Navigation with High-Dimensional Sensory Inputs , 2010, ICONIP.
[15] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[16] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[17] Junichiro Yoshimoto,et al. Free-energy-based reinforcement learning in a partially observable environment , 2010, ESANN.
[18] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[19] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[20] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[21] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[22] Yee Whye Teh,et al. Actor-Critic Reinforcement Learning with Energy-Based Policies , 2012, EWRL.
[23] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[24] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[25] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[26] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[31] Qiang Liu,et al. Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.
[32] Yann LeCun,et al. Energy-based Generative Adversarial Network , 2016, ICLR.
[33] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[34] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[35] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.
[36] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[37] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[38] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[39] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[40] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[41] Yoshua Bengio,et al. Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.
[42] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[43] Yang Liu,et al. Stein Variational Policy Gradient , 2017, UAI.
[44] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[45] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[46] Trevor Darrell,et al. Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.
[47] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[48] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[49] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .