暂无分享,去创建一个
Anca D. Dragan | Marek Petrik | Ashwin Balakrishna | Ken Goldberg | Jerry Zhu | Satvik Sharma | Daniel S. Brown | Zaynah Javed | Ken Goldberg | A. Dragan | Marek Petrik | A. Balakrishna | Jerry Zhu | Satvik Sharma | Zaynah Javed
[1] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[2] Anca D. Dragan,et al. Simplifying Reward Design through Divide-and-Conquer , 2018, Robotics: Science and Systems.
[3] Ruslan Salakhutdinov,et al. Worst Cases Policy Gradients , 2019, CoRL.
[4] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.
[5] Reazul Hasan Russel,et al. Entropic Risk Constrained Soft-Robust Policy Optimization , 2020, ArXiv.
[6] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[7] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[8] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[9] Anca D. Dragan,et al. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning , 2018, ICML.
[10] F. Delbaen. Coherent Risk Measures on General Probability Spaces , 2002 .
[11] Scott Niekum,et al. Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences , 2020, ICML.
[12] Brijen Thananjeyan,et al. LazyDAgger: Reducing Context Switching in Interactive Imitation Learning , 2021, 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE).
[13] Prashant Doshi,et al. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress , 2018, Artif. Intell..
[14] Marek Petrik,et al. Bayesian Robust Optimization for Imitation Learning , 2020, NeurIPS.
[15] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[16] Mohammad Ghavamzadeh,et al. Soft-Robust Algorithms for Handling Model Misspecification , 2020, ArXiv.
[17] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[18] Shimon Whiteson,et al. Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning , 2020, AAAI.
[19] Jan Peters,et al. Entropic Risk Measure in Policy Search , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[20] H. Föllmer,et al. ENTROPIC RISK MEASURES: COHERENCE VS. CONVEXITY, MODEL AMBIGUITY AND ROBUST LARGE DEVIATIONS , 2011 .
[21] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[22] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[23] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[24] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[25] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[26] Marco Pavone,et al. Risk-sensitive Inverse Reinforcement Learning via Coherent Risk Models , 2017, Robotics: Science and Systems.
[27] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[28] S. Levine,et al. Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks , 2019, IEEE Robotics and Automation Letters.
[29] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[30] Anca D. Dragan,et al. Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.
[31] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.
[32] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.
[33] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[34] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[35] Scott Niekum,et al. Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning , 2017, AAAI.
[36] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[37] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[38] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[39] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.
[40] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.
[41] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[42] Jaime F. Fisac,et al. A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.
[43] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.
[44] Marco Pavone,et al. Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.
[45] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[46] Brijen Thananjeyan,et al. Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones , 2020, IEEE Robotics and Automation Letters.
[47] Shie Mannor,et al. Soft-Robust Actor-Critic Policy-Gradient , 2018, UAI.
[48] Shie Mannor,et al. Policy Gradients Beyond Expectations: Conditional Value-at-Risk , 2014, ArXiv.
[49] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[50] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[51] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.
[52] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.
[53] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[54] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[55] Yang Cai,et al. Learning Safe Policies with Expert Guidance , 2018, NeurIPS.
[56] Kee-Eung Kim,et al. MAP Inference for Bayesian Inverse Reinforcement Learning , 2011, NIPS.
[57] Craig Boutilier,et al. Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.
[58] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[59] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[60] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[61] Brijen Thananjeyan,et al. ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions , 2020, WAFR.