暂无分享,去创建一个
Yee Whye Teh | Razvan Pascanu | Hyeonwoo Noh | Alexandre Galashov | Dhruva Tirumala | Leonard Hasenclever | Jonathan Schwarz | Guillaume Desjardins | Wojciech Marian Czarnecki | Arun Ahuja | Nicolas Heess | Wojciech M. Czarnecki | N. Heess | Y. Teh | Arun Ahuja | Dhruva Tirumala | Leonard Hasenclever | Razvan Pascanu | Guillaume Desjardins | Alexandre Galashov | Jonathan Schwarz | Hyeonwoo Noh
[1] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[2] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[3] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.
[4] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[7] David Barber,et al. An Auxiliary Variational Method , 2004, ICONIP.
[8] Yee Whye Teh,et al. Meta reinforcement learning as task inference , 2019, ArXiv.
[9] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[10] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[11] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[12] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[13] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[14] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.
[15] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[16] Sergey Levine,et al. Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.
[17] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[18] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[19] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[20] Jan Peters,et al. Learning movement primitive libraries through probabilistic segmentation , 2017, Int. J. Robotics Res..
[21] Tatiana V. Guy,et al. Decision Making with Imperfect Decision Makers , 2011 .
[22] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.
[23] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[24] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[25] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[26] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[27] Jan Peters,et al. Probabilistic Movement Primitives , 2013, NIPS.
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] René Boel,et al. Discrete event dynamic systems: Theory and applications. , 2002 .
[30] Yee Whye Teh,et al. Information asymmetry in KL-regularized RL , 2019, ICLR.
[31] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[32] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[33] Sergey Levine,et al. Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.
[34] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[35] Ronen I. Brafman,et al. Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.
[36] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[37] Daniel A. Braun,et al. Information, Utility and Bounded Rationality , 2011, AGI.
[38] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[40] Daniel A. Braun,et al. Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
[41] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[42] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[43] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.
[44] Ion Stoica,et al. Multi-Level Discovery of Deep Options , 2017, ArXiv.
[45] Ion Stoica,et al. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.
[46] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[47] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.
[48] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[49] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[50] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.
[51] Christoph Salge,et al. Empowerment - an Introduction , 2013, ArXiv.
[52] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[53] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.
[54] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[55] Dushyant Rao,et al. Data-efficient Hindsight Off-policy Option Learning , 2020, ArXiv.
[56] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[57] SinghSatinder,et al. Between MDPs and semi-MDPs , 1999 .
[58] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[59] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[60] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[61] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[62] Max Welling,et al. Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.
[63] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[64] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[65] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[66] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[67] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[68] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[69] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.
[70] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[71] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[72] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[73] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[74] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[75] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[76] Naftali Tishby,et al. Trading Value and Information in MDPs , 2012 .
[77] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference (Extended Abstract) , 2013, IJCAI.
[78] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.
[79] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[80] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[81] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.
[82] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[83] M. Botvinick,et al. Mental labour , 2018, Nature Human Behaviour.
[84] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[85] Alexander A. Alemi,et al. Fixing a Broken ELBO , 2017, ICML.
[86] Joshua B. Tenenbaum,et al. Learning to Share and Hide Intentions using Information Regularization , 2018, NeurIPS.
[87] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[88] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[89] H. Simon,et al. Rational choice and the structure of the environment. , 1956, Psychological review.
[90] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[91] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[92] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[93] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[94] Pieter Abbeel,et al. Policy transfer via modularity and reward guiding , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[95] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[96] J. Urgen Schmidhuber,et al. Neural sequence chunkers , 1991, Forschungsberichte, TU Munich.
[97] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[98] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[99] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.
[100] Chrystopher L. Nehaniv,et al. Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.
[101] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.
[102] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[103] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[104] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[105] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .
[106] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.
[107] Naftali Tishby,et al. A Unified Bellman Equation for Causal Information and Value in Markov Decision Processes , 2017, ArXiv.
[108] Wojciech Zaremba,et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.
[109] Yee Whye Teh,et al. Transferring Task Goals via Hierarchical Reinforcement Learning , 2018 .
[110] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[111] Sergey Levine,et al. InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.
[112] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[113] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.