The AGI Containment Problem

There is considerable uncertainty about what properties, capabilities and motivations future AGIs will have. In some plausible scenarios, AGIs may pose security risks arising from accidents and defects. In order to mitigate these risks, prudent early AGI research teams will perform significant testing on their creations before use. Unfortunately, if an AGI has human-level or greater intelligence, testing itself may not be safe; some natural AGI goal systems create emergent incentives for AGIs to tamper with their test environments, make copies of themselves on the internet, or convince developers and operators to do dangerous things. In this paper, we survey the AGI containment problem – the question of how to build a container in which tests can be conducted safely and reliably, even on AGIs with unknown motivations and capabilities that could be dangerous. We identify requirements for AGI containers, available mechanisms, and weaknesses that need to be addressed.

[1]  R.V. Yampolskiy,et al.  ARTIFICIAL INTELLIGENCE APPROACHES FOR INTRUSION DETECTION , 2006, 2006 IEEE Long Island Systems, Applications and Technology Conference.

[2]  Leon Reznik,et al.  Anomaly Detection Based Intrusion Detection , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[3]  Adrian Colesa,et al.  Software Random Number Generation Based on Race Conditions , 2008, 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[4]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[5]  Roman V. Yampolskiy,et al.  Leakproofing the Singularity Artificial Intelligence Confinement Problem , 2012 .

[6]  Julian Togelius,et al.  The Mario AI Championship 2009-2012 , 2013, AI Mag..

[7]  Eliezer Yudkowsky,et al.  Intelligence Explosion Microeconomics , 2013 .

[8]  Aurélien Francillon,et al.  Confidentiality Issues on a GPU in a Virtualized Environment , 2014, Financial Cryptography.

[9]  J. Shaw,et al.  Constructing Rich False Memories of Committing Crime , 2015, Psychological science.

[10]  Kaj Sotala,et al.  Corrigendum: Responses to catastrophic AGI risk: a survey (2015 Phys. Scr. 90 018001) , 2015, Physica Scripta.

[11]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[12]  Roman V Yampolskiy,et al.  Responses to catastrophic AGI risk: a survey , 2014 .

[13]  Stefan Mangard,et al.  Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript , 2015, DIMVA.