Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
暂无分享,去创建一个
[1] S. Levine,et al. Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks , 2019, IEEE Robotics and Automation Letters.
[2] Peter Stone,et al. Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[3] Richard Zemel,et al. A Divergence Minimization Perspective on Imitation Learning Methods , 2019, CoRL.
[4] Dorsa Sadigh,et al. Asking Easy Questions: A User-Friendly Approach to Active Reward Learning , 2019, CoRL.
[5] A. Vries. Value at Risk , 2019, Derivatives.
[6] Dorsa Sadigh,et al. Learning Reward Functions by Integrating Human Demonstrations and Preferences , 2019, Robotics: Science and Systems.
[7] Sergey Levine,et al. Causal Confusion in Imitation Learning , 2019, NeurIPS.
[8] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[9] Guodong Zhang,et al. Functional Variational Bayesian Neural Networks , 2019, ICLR.
[10] Marek Petrik,et al. Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs , 2019, NeurIPS.
[11] Andrew Gordon Wilson,et al. A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.
[12] Peter Eckersley. Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function) , 2019, SafeAI@AAAI.
[13] Marco Pavone,et al. Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.
[14] Katherine Rose Driggs-Campbell,et al. EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[15] Peter Stone,et al. Importance Sampling Policy Evaluation with an Estimated Behavior Policy , 2018, ICML.
[16] Finale Doshi-Velez,et al. Projected BNNs: Avoiding weight-space pathologies by learning latent representations of neural network weights , 2018, 1811.07006.
[17] Finale Doshi-Velez,et al. Latent Projection BNNs: Avoiding weight-space pathologies by learning latent representations of neural network weights , 2018, ArXiv.
[18] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[19] Yuchen Cui,et al. Risk-Aware Active Inverse Reinforcement Learning , 2018, CoRL.
[20] Anca D. Dragan,et al. Learning under Misspecified Objective Spaces , 2018, CoRL.
[21] Zoubin Ghahramani,et al. Variational Bayesian dropout: pitfalls and fixes , 2018, ICML.
[22] Didrik Nielsen,et al. Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.
[23] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[24] Yang Cai,et al. Learning Safe Policies with Expert Guidance , 2018, NeurIPS.
[25] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.
[26] Yuchen Cui,et al. Active Reward Learning from Critiques , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[27] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[28] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[29] Scott Niekum,et al. Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning , 2017, AAAI.
[30] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[31] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[32] Katherine Rose Driggs-Campbell,et al. DropoutDAgger: A Bayesian Approach to Safe Imitation Learning , 2017, ArXiv.
[33] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[34] Marco Pavone,et al. Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models , 2017, Robotics: Science and Systems.
[35] Anca D. Dragan,et al. Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.
[36] Anca D. Dragan,et al. Pragmatic-Pedagogic Value Alignment , 2017, ISRR.
[37] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[38] Brendan J. Frey,et al. PixelGAN Autoencoders , 2017, NIPS.
[39] Anca D. Dragan,et al. Should Robots be Obedient? , 2017, IJCAI.
[40] Anca D. Dragan,et al. DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.
[41] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.
[42] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Simulated Driving , 2017, AAAI.
[43] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[44] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[45] Sergey Levine,et al. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.
[46] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.
[47] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[48] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[49] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.
[50] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[51] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[52] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[53] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[54] Ian Osband,et al. Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout , 2016 .
[55] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[56] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.
[57] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[58] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[59] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[60] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[61] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[62] Maksims Volkovs,et al. New learning methods for supervised and unsupervised preference aggregation , 2014, J. Mach. Learn. Res..
[63] Robin S Spruce,et al. "Learning to be a Learner" , 2021 .
[64] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[65] Alan Fern,et al. A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.
[66] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[67] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[68] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[69] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[70] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[71] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[72] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[73] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[74] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[75] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[76] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[77] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.
[78] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .