Solving Continuous Control via Q-learning
暂无分享,去创建一个
Martin A. Riedmiller | Markus Wulfmeier | Wilko Schwarting | Igor Gilitschenski | T. Seyde | D. Rus | Peter Werner
[1] Yashraj S. Narang,et al. Accelerated Policy Learning with Parallel Differentiable Simulation , 2022, ICLR.
[2] Xiaolong Wang,et al. Temporal Difference Learning for Model Predictive Control , 2022, ICML.
[3] Alessandro Lazaric,et al. Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning , 2021, ICLR.
[4] Igor Gilitschenski,et al. Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies , 2021, NeurIPS.
[5] Philipp Reist,et al. Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning , 2021, CoRL.
[6] Miles Macklin,et al. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning , 2021, NeurIPS Datasets and Benchmarks.
[7] Stefano V. Albrecht,et al. Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing , 2021, 2102.07475.
[8] Ankush Gupta,et al. Representation Matters: Improving Perception and Exploration for Robotics , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[9] Petar Kormushev,et al. Learning to Represent Action Values as a Hypergraph on the Action Vertices , 2020, ICLR.
[10] S. Karaman,et al. Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles , 2020, CoRL.
[11] Mohammad Norouzi,et al. Mastering Atari with Discrete World Models , 2020, ICLR.
[12] Stephen C. Adams,et al. Value-Decomposition Multi-Agent Actor-Critics , 2020, AAAI.
[13] Shimon Whiteson,et al. FACMAC: Factored Multi-Agent Centralised Policy Gradients , 2020, NeurIPS.
[14] Beining Han,et al. Off-Policy Multi-Agent Decomposed Policy Gradients , 2020, ICLR.
[15] Timothy Verstraeten,et al. Cooperative Prioritized Sweeping , 2021, AAMAS.
[16] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.
[17] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[18] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.
[19] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.
[20] Andriy Mnih,et al. Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.
[21] Jimmy Ba,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[22] S. Whiteson,et al. Deep Coordination Graphs , 2019, ICML.
[23] Shimon Whiteson,et al. Growing Action Spaces , 2019, ICML.
[24] Yunhao Tang,et al. Discretizing Continuous Action Space for On-Policy Optimization , 2019, AAAI.
[25] Martin A. Riedmiller,et al. Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics , 2020, CoRL.
[26] S. Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[27] Shimon Whiteson,et al. Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, ArXiv.
[28] Sangbae Kim,et al. Mini Cheetah: A Platform for Pushing the Limits of Dynamic Quadruped Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[29] Yung Yi,et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.
[30] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[31] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[32] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[33] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[34] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[35] Arash Tavakoli,et al. Action Branching Architectures for Deep Reinforcement Learning , 2017, AAAI.
[36] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[37] Guy Lever,et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.
[38] Xiangxiang Chu,et al. Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning , 2017, ArXiv.
[39] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[40] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[41] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.
[42] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[43] Balaraman Ravindran,et al. Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning , 2017, ArXiv.
[44] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[45] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.
[46] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.
[47] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.
[48] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[49] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[50] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[51] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[52] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[53] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.
[54] Markus Wulfmeier,et al. Strength Through Diversity: Robust Behavior Learning via Mixture Policies , 2010 .
[55] Guillaume J. Laurent,et al. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[56] Bart De Schutter,et al. Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.
[57] Nikos A. Vlassis,et al. Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..
[58] Sean Luke,et al. Lenient learners in cooperative multiagent systems , 2006, AAMAS '06.
[59] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[60] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[61] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[62] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[63] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.
[64] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.
[65] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[66] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .