Dynamics-Aware Comparison of Learned Reward Functions
暂无分享,去创建一个
Ashwin Balakrishna | Rowan McAllister | Adrien Gaidon | Blake Wulfe | Logan Ellis | Jean Mercat | Adrien Gaidon | A. Balakrishna | Blake Wulfe | Jean-Pierre Mercat | Logan Ellis | R. McAllister
[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[2] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[3] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.
[4] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[5] Sergey Levine,et al. Off-Policy Evaluation via Off-Policy Classification , 2019, NeurIPS.
[6] Oleg O. Sushkov,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.
[7] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[8] Devinder Thapa,et al. Agent Based Decision Support System Using Reinforcement Learning Under Emergency Circumstances , 2005, ICNC.
[9] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[10] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[11] Mykel J. Kochenderfer,et al. Optimizing the Next Generation Collision Avoidance System for Safe, Suitable, and Acceptable Operational Performance , 2013 .
[12] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[13] D. Sculley,et al. The ML test score: A rubric for ML production readiness and technical debt reduction , 2017, 2017 IEEE International Conference on Big Data (Big Data).
[14] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[15] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[16] Sergey Levine,et al. Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.
[17] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[18] Shane Legg,et al. Quantifying Differences in Reward Functions , 2020, ArXiv.
[19] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[20] Peter Stone,et al. Reward (Mis)design for Autonomous Driving , 2021, ArXiv.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Srivatsan Srinivasan,et al. Truly Batch Apprenticeship Learning with Deep Successor Features , 2019, IJCAI.
[24] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[25] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[26] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[27] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.