Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes
暂无分享,去创建一个
[1] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[2] Nan Jiang,et al. Repeated Inverse Reinforcement Learning , 2017, NIPS.
[3] Craig Boutilier,et al. Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies , 2010, AAAI.
[4] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.
[5] Edmund H. Durfee,et al. Comparing Action-Query Strategies in Semi-Autonomous Agents , 2011, AAAI.
[6] Mausam,et al. A Theory of Goal-Oriented MDPs with Dead Ends , 2012, UAI.
[7] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[8] Anca D. Dragan,et al. SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[9] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[10] Craig Boutilier,et al. Regret-based optimal recommendation sets in conversational recommender systems , 2009, RecSys '09.
[11] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[12] Bruno Zanuttini,et al. Interactive Value Iteration for Markov Decision Processes with Unknown Rewards , 2013, IJCAI.
[13] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[14] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[15] Edmund H. Durfee,et al. Symmetric approximate linear programming for factored MDPs with application to constrained problems , 2006, Annals of Mathematics and Artificial Intelligence.
[16] Florent Teichteil-Königsbuch. Stochastic Safest and Shortest Path Problems , 2012, AAAI.
[17] Edmund H. Durfee,et al. Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.
[18] Anca D. Dragan,et al. Should Robots be Obedient? , 2017, IJCAI.
[19] Craig Boutilier,et al. Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.
[20] Steffen Udluft,et al. Safe exploration for reinforcement learning , 2008, ESANN.
[21] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[22] Edmund H. Durfee,et al. Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes , 2017, ICAPS.