暂无分享,去创建一个
Shuai Han | Zhenghao Zhang | Shuai Lü | Junwei Zhang | Zhenghao Zhang | Shuai Lü | Junwei Zhang | Shuai Han
[1] Daochen Zha,et al. Experience Replay Optimization , 2019, IJCAI.
[2] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[3] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[6] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[7] Stefano Ermon,et al. Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.
[8] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.
[9] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[10] Baochun Li,et al. Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization , 2018, NeurIPS.
[11] Junxiang Li,et al. Deep reinforcement learning for pedestrian collision avoidance and human-machine cooperative driving , 2020, Inf. Sci..
[12] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[13] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[14] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[15] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[16] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[17] Hongbing Wang,et al. A multi-agent reinforcement learning approach to dynamic service composition , 2016 .
[18] Hao Wu,et al. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning , 2019, AAAI.
[19] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[20] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[21] Larry Rudolph,et al. Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms? , 2018, ArXiv.
[22] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[24] Emanuel Todorov,et al. Convex and analytically-invertible dynamics with contacts and constraints: Theory and implementation in MuJoCo , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[25] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[26] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.
[27] James Bergstra,et al. Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Xin Xu,et al. Reinforcement learning algorithms with function approximation: Recent advances and applications , 2014, Inf. Sci..
[30] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[31] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.