Utility function security in artificially intelligent agents

The notion of ‘wireheading’, or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.

[1]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[2]  H. Rice Classes of recursively enumerable sets and their decision problems , 1953 .

[3]  James L Olds,et al.  Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[4]  R. Heath ELECTRICAL SELF-STIMULATION OF THE BRAIN IN MAN. , 1963, The American journal of psychiatry.

[5]  J.‐H. Scharf,et al.  K. Zuse, Rechnender Raum (Schriften zur Datenverarbeitung, Band 1). VIII + 70 S. m. 74 Abb. Braunschweig 1969. Friedr. Vieweg & Sohn. Preis brosch. DM 16,80 , 1971 .

[6]  G. Smith ANARCHY, STATE, AND UTOPIA , 1976 .

[7]  R. Nozick,et al.  Anarchy, State, and Utopia , 1978 .

[8]  Douglas B. Lenat,et al.  EURISKO: A Program That Learns New Heuristics and Domain Concepts , 1983, Artif. Intell..

[9]  C L Devito,et al.  A language based on the fundamental facts of science. , 1990, Journal of the British Interplanetary Society.

[10]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[11]  Konrad Zuse,et al.  Rechnender Raum , 1991, Physik und Informatik.

[12]  Timothy W. Finin,et al.  Specification of the KQML Agent-Communication Language , 1993 .

[13]  Timothy W. Finin,et al.  KQML as an agent communication language , 1994, CIKM '94.

[14]  C. Bell Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision: DSM-IV-TR Quick Reference to the Diagnostic Criteria from DSM-IV-TR , 2001 .

[15]  N. Bostrom ARE YOU LIVING IN A COMPUTER SIMULATION ? , 2001 .

[16]  G. Loewenstein,et al.  Time Discounting and Time Preference: A Critical Review , 2002 .

[17]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.

[18]  N. Bostrom Are We Living in a Computer Simulation , 2003 .

[19]  D. Dennett Why you can't make a computer that feels pain , 1978, Synthese.

[20]  Marcus Hutter Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[21]  E. Schrödinger Die gegenwärtige Situation in der Quantenmechanik , 1935, Naturwissenschaften.

[22]  E. Schrödinger Die gegenwärtige Situation in der Quantenmechanik , 2005, Naturwissenschaften.

[23]  Polly S Nichols,et al.  Agreeing to disagree. , 2005, General dentistry.

[24]  Scott Aaronson,et al.  The complexity of agreement , 2004, STOC '05.

[25]  B. Goertzel Potential Computational Linguistics Resources for Lojban , 2005 .

[26]  Nick Bostrom What is a Singleton , 2006 .

[27]  R. Aumann Agreeing to disagree. , 1976, Nature cell biology.

[28]  Eliezer Yudkowsky Artificial Intelligence as a Positive and Negative Factor in Global Risk , 2006 .

[29]  Venu Govindaraju,et al.  Behavioral Biometrics for Recognition and Verification of Game Bots , 2007, GAMEON.

[30]  Peter de Blanc Convergence of Expected Utilities with Algorithmic Probability Distributions , 2007, ArXiv.

[31]  Venu Govindaraju,et al.  Behavioural biometrics: a survey and classification , 2008, Int. J. Biom..

[32]  Venu Govindaraju,et al.  Behavioral biometrics for verification and recognition of malicious software agents , 2008, SPIE Defense + Commercial Sensing.

[33]  Roman V. Yampolskiy Behavioral Modeling: an Overview , 2008 .

[34]  R.V. Yampolskiy Detecting and Controlling Cheating in Online Poker , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[35]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[36]  Mark Bishop,et al.  Why Computers Can’t Feel Pain , 2009, Minds and Machines.

[37]  Peter de Blanc Convergence of Expected Utility for Universal AI , 2009, ArXiv.

[38]  Tony Stoklosa Super intelligence , 2010, Nature.

[39]  S. Kanazawa,et al.  Intelligence and Substance Use , 2010 .

[40]  Peter de Blanc Ontological Crises in Artificial Agents' Value Systems , 2011, ArXiv.

[41]  Roman V. Yampolskiy,et al.  Evaluation of authorship attribution software on a Chat bot corpus , 2011, 2011 XXIII International Symposium on Information, Communication and Automation Technologies.

[42]  Laurent Orseau,et al.  Self-Modification and Mortality in Artificial Agents , 2011, AGI.

[43]  Roman V. Yampolskiy,et al.  What to Do with the Singularity Paradox? , 2011, PT-AI.

[44]  Roman V. Yampolskiy,et al.  Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach , 2011, PT-AI.

[45]  Jürgen Schmidhuber,et al.  A Family of Gödel Machine Implementations , 2011, AGI.

[46]  Eliezer Yudkowsky,et al.  Complex Value Systems in Friendly AI , 2011, AGI.

[47]  Laurent Orseau,et al.  Delusion, Survival, and Intelligent Agents , 2011, AGI.

[48]  Daniel Dewey,et al.  Learning What to Value , 2011, AGI.

[49]  Anil K. Jain,et al.  Face Recognition in the Virtual World: Recognizing Avatar Faces , 2012, 2012 11th International Conference on Machine Learning and Applications.

[50]  Roman V Yampolskiy,et al.  Artificial General Intelligence and the Human Mental Model , 2012 .

[51]  Roman V. Yampolskiy,et al.  AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an Artificially Intelligent System , 2012 .

[52]  Luke Muehlhauser,et al.  The Singularity and Machine Ethics , 2012 .

[53]  Bill Hibbard,et al.  Model-based Utility Functions , 2011, J. Artif. Gen. Intell..

[54]  Marina L. Gavrilova,et al.  Artimetrics: Biometrics for Artificial Entities , 2012, IEEE Robotics & Automation Magazine.

[55]  Roman V. Yampolskiy,et al.  Leakproofing the Singularity Artificial Intelligence Confinement Problem , 2012 .

[56]  Amnon H. Eden,et al.  Singularity Hypotheses: A Scientific and Philosophical Assessment , 2013 .

[57]  Theodore Kaczynski Industrial Society and Its Future , 2013 .

[58]  Roman V. Yampolskiy,et al.  Turing Test as a Defining Feature of AI-Completeness , 2013, Artificial Intelligence, Evolutionary Computing and Metaheuristics.

[59]  Ευαγγελία Ρούντα Steven D. Levitt & Stephen J. Dubner, Freakonomics: A rogue economist explores the hidden side of everything , 2015, SAS 2015.