暂无分享,去创建一个
[1] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[2] Scott Garrabrant,et al. Embedded Agency , 2019, ArXiv.
[3] Laurent Orseau,et al. Measuring and avoiding side effects using relative reachability , 2018, ArXiv.
[4] Stephen M. Omohundro,et al. The Basic AI Drives , 2008, AGI.
[5] Dylan Hadfield-Menell,et al. Conservative Agency via Attainable Utility Preservation , 2019, AIES.
[6] Marcus Hutter,et al. AGI Safety Literature Review , 2018, IJCAI.
[7] Shane Legg,et al. The Incentives that Shape Behaviour , 2020, ArXiv.
[8] Stuart Armstrong,et al. Motivated Value Selection for Artificial Agents , 2015, AAAI Workshop: AI and Ethics.
[9] Ramana Kumar,et al. Modeling AGI Safety Frameworks with Causal Influence Diagrams , 2019, AISafety@IJCAI.
[10] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[11] Marcus Hutter,et al. Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective , 2019, Synthese.
[12] R. Dechter,et al. Heuristics, Probability and Causality. A Tribute to Judea Pearl , 2010 .
[13] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[14] Illtyd Trethowan. Causality , 1938 .
[15] Laurent Orseau,et al. Safely Interruptible Agents , 2016, UAI.
[16] Anca D. Dragan,et al. The Off-Switch Game , 2016, IJCAI.
[17] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[18] Koen Holtman,et al. Corrigibility with Utility Preservation , 2019, ArXiv.
[19] Stuart Armstrong,et al. 'Indifference' methods for managing agent rewards , 2017, ArXiv.