暂无分享,去创建一个
[1] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[2] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[3] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[4] Dorsa Sadigh,et al. Asking Easy Questions: A User-Friendly Approach to Active Reward Learning , 2019, CoRL.
[5] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[6] Marco Pavone,et al. Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.
[7] Prashant Doshi,et al. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress , 2018, Artif. Intell..
[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[9] Yuchen Cui,et al. Risk-Aware Active Inverse Reinforcement Learning , 2018, CoRL.
[10] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[11] Anca D. Dragan,et al. DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.
[12] Scott Niekum,et al. Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations , 2019, CoRL.
[13] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Simulated Driving , 2017, AAAI.
[14] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[15] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[16] Brendan J. Frey,et al. PixelGAN Autoencoders , 2017, NIPS.
[17] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .
[18] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[19] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[20] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[21] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[22] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[23] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[24] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[25] Peter Stone,et al. Importance Sampling Policy Evaluation with an Estimated Behavior Policy , 2018, ICML.
[26] Sergey Levine,et al. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.
[27] Sergey Levine,et al. Extending Deep Model Predictive Control with Safety Augmented Value Estimation from Demonstrations , 2019, ArXiv.
[28] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[29] A. Vries. Value at Risk , 2019, Derivatives.
[30] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.
[31] Peter Stone,et al. Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation , 2016, AAAI.
[32] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.
[33] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.
[34] Peter Stone,et al. Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[35] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[36] Marek Petrik,et al. Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs , 2019, NeurIPS.
[37] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[38] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[39] Scott Niekum,et al. Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning , 2017, AAAI.
[40] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .
[41] Brijen Thananjeyan,et al. Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks , 2020, IEEE Robotics and Automation Letters.
[42] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[43] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[44] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[45] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.
[46] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.