论文信息 - Chess as a Testing Grounds for the Oracle Approach to AI Safety

Chess as a Testing Grounds for the Oracle Approach to AI Safety

To reduce the danger of powerful super-intelligent AIs, we might make the first such AIs oracles that can only send and receive messages. This paper proposes a possibly practical means of using machine learning to create two classes of narrow AI oracles that would provide chess advice: those aligned with the player's interest, and those that want the player to lose and give deceptively bad advice. The player would be uncertain which type of oracle it was interacting with. As the oracles would be vastly more intelligent than the player in the domain of chess, experience with these oracles might help us prepare for future artificial general intelligence oracles.

[1] Roman V. Yampolskiy,et al. Leakproofing the Singularity Artificial Intelligence Confinement Problem , 2012 .

[2] Dario Amodei,et al. AI safety via debate , 2018, ArXiv.

[3] Nick Bostrom,et al. Thinking Inside the Box: Controlling and Using an Oracle AI , 2012, Minds and Machines.

[4] Eliezer Yudkowsky. Artificial Intelligence as a Positive and Negative Factor in Global Risk , 2006 .

[5] N. Bostrom. INFORMATION HAZARDS: A TYPOLOGY OF POTENTIAL HARMS FROM KNOWLEDGE , 2011 .

[6] Siddhartha Sen,et al. Aligning Superhuman AI with Human Behavior: Chess as a Model System , 2020, KDD.

[7] Stuart Armstrong,et al. Good and safe uses of AI Oracles , 2017, ArXiv.

[8] Olle Häggström. Strategies for an Unfriendly Oracle AI with Reset Button , 2018 .

[9] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[10] A. Casadevall,et al. Risks and Benefits of Gain-of-Function Experiments with Pathogens of Pandemic Potential, Such as Influenza Virus: a Call for a Science-Based Discussion , 2014, mBio.

[11] Roman V. Yampolskiy,et al. Unexplainability and Incomprehensibility of AI , 2020, J. Artif. Intell. Conscious..

[12] Nick Bostrom,et al. Superintelligence: Paths, Dangers, Strategies , 2014 .

[13] A. Sandberg,et al. Information Hazards in Biotechnology , 2018, Risk analysis : an official publication of the Society for Risk Analysis.

[14] Stuart Russell. Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .

[15] A. Turing. Intelligent Machinery, A Heretical Theory* , 1996 .

[16] Ulrich Paquet,et al. Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess , 2020, ArXiv.

[17] Feng-Hsiung Hsu,et al. Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .

[18] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.