暂无分享,去创建一个
Che Wang | Zheng Wang | Zijian Zhou | Keith Ross | Yanqiu Wu | Xinyue Chen | Qing Deng
[1] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.
[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[3] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[4] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[5] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[6] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[7] Mohamed Medhat Gaber,et al. Imitation Learning , 2017, ACM Comput. Surv..
[8] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[9] Byron Boots,et al. Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning , 2018, ICLR.
[10] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[11] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[12] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[13] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[14] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[15] Matthieu Geist,et al. Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Alessandro Lazaric,et al. Direct Policy Iteration with Demonstrations , 2015, IJCAI.
[18] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[19] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[20] Joelle Pineau,et al. Learning from Limited Demonstrations , 2013, NIPS.
[21] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[22] Yiming Zhang,et al. Supervised Policy Update for Deep Reinforcement Learning , 2018, ICLR.
[23] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[24] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[25] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[26] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[27] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[28] Stefano Ermon,et al. Model-Free Imitation Learning with Policy Optimization , 2016, ICML.
[29] Sergey Levine,et al. Off-Policy Evaluation via Off-Policy Classification , 2019, NeurIPS.
[30] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[31] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[32] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[33] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[34] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[35] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[36] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[37] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[38] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[39] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[40] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.
[41] Qing Wang,et al. Exponentially Weighted Imitation Learning for Batched Historical Data , 2018, NeurIPS.
[42] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[43] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[44] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[45] Sergey Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[46] Che Wang,et al. Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning , 2019, ArXiv.
[47] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.