暂无分享,去创建一个
Yee Whye Teh | Razvan Pascanu | Wojciech Czarnecki | Nicolas Heess | Jonathan Schwarz | Alexandre Galashov | Dhruva Tirumala | Leonard Hasenclever | Guillaume Desjardins | Siddhant M. Jayakumar | Wojciech M. Czarnecki | N. Heess | Y. Teh | Dhruva Tirumala | Leonard Hasenclever | Razvan Pascanu | Guillaume Desjardins | Alexandre Galashov | Jonathan Schwarz
[1] H. Simon,et al. Rational choice and the structure of the environment. , 1956, Psychological review.
[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[3] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[4] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[5] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[6] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[7] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[8] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[9] Daniel A. Braun,et al. Information, Utility & Bounded Rationality , 2011, ArXiv.
[10] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.
[11] Naftali Tishby,et al. Trading Value and Information in MDPs , 2012 .
[12] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[13] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[14] Daniel A. Braun,et al. Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
[15] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[16] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[17] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[18] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[19] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[20] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[22] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[23] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[26] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[27] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[28] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[29] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[30] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[31] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[32] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[33] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.
[34] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[35] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[36] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[37] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[38] Naftali Tishby,et al. A Unified Bellman Equation for Causal Information and Value in Markov Decision Processes , 2017, ArXiv.
[39] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.
[40] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[41] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[42] Alexander A. Alemi,et al. Fixing a Broken ELBO , 2017, ICML.
[43] Yee Whye Teh,et al. Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.
[44] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[45] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[46] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[47] Andrew Zisserman,et al. Kickstarting Deep Reinforcement Learning , 2018, ArXiv.
[48] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[49] Sergey Levine,et al. InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.