暂无分享,去创建一个
Scott Garrabrant | Joar Skalse | Vladimir Mikulik | Evan Hubinger | Chris van Merwijk | Scott Garrabrant | Vladimir Mikulik | Joar Skalse | Evan Hubinger
[1] Allan Jabri,et al. Universal Planning Networks , 2018, ICML.
[2] Chico Q. Camargo,et al. Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.
[3] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning , 2017, ICLR.
[4] Shane Legg,et al. Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.
[5] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[6] Dario Amodei,et al. Supervising strong learners by amplifying weak experts , 2018, ArXiv.
[7] Sergiu Hart,et al. The Absent-Minded Driver , 1996, TARK.
[8] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.
[9] Razvan Pascanu,et al. Learning model-based planning from scratch , 2017, ArXiv.
[10] Nick Bostrom,et al. Superintelligence: Paths, Dangers, Strategies , 2014 .
[11] Scott Garrabrant,et al. Categorizing Variants of Goodhart's Law , 2018, ArXiv.
[12] Stuart Armstrong,et al. Occam's razor is insufficient to infer the preferences of irrational agents , 2017, NeurIPS.
[13] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[14] Min Wu,et al. Safety Verification of Deep Neural Networks , 2016, CAV.
[15] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[16] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[17] Junfeng Yang,et al. Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems , 2017, ArXiv.
[18] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[19] Kareem Amin,et al. Towards Resolving Unidentifiability in Inverse Reinforcement Learning , 2016, ArXiv.
[20] Kouichi Sakurai,et al. One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.