暂无分享,去创建一个
[1] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[2] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[3] N Wiener,et al. Some moral and technical consequences of automation , 1960, Science.
[4] Christopher Joseph Pal,et al. Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents , 2019, ICLR.
[5] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.
[6] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.
[7] Daniel Gómez,et al. Polynomial calculation of the Shapley value based on sampling , 2009, Comput. Oper. Res..
[8] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[9] Jonathan Dodge,et al. Visualizing and Understanding Atari Agents , 2017, ICML.
[10] Ziyan Wu,et al. Counterfactual Visual Explanations , 2019, ICML.
[11] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.
[12] Alex Mott,et al. Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.
[13] Sergey Levine,et al. Causal Confusion in Imitation Learning , 2019, NeurIPS.
[14] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.
[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[16] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[17] Anca D. Dragan,et al. Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.
[18] Chris Russell,et al. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.
[19] Shane Legg,et al. Quantifying Differences in Reward Functions , 2020, ArXiv.
[20] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[21] Eliezer Yudkowsky. Artificial Intelligence as a Positive and Negative Factor in Global Risk , 2006 .
[22] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[23] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Eugene Santos,et al. Explaining Reward Functions in Markov Decision Processes , 2019, FLAIRS.
[25] Dylan Hadfield-Menell,et al. Multi-Principal Assistance Games , 2020, ArXiv.
[26] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[27] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.