论文信息 - Utility function security in artificially intelligent agents

Utility function security in artificially intelligent agents

The notion of ‘wireheading’, or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.

Roman V. Yampolskiy | Roman V Yampolskiy

[1] A. Turing. On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[2] H. Rice. Classes of recursively enumerable sets and their decision problems , 1953 .

[3] James L Olds,et al. Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[4] R. Heath. ELECTRICAL SELF-STIMULATION OF THE BRAIN IN MAN. , 1963, The American journal of psychiatry.

[5] J.‐H. Scharf,et al. K. Zuse, Rechnender Raum (Schriften zur Datenverarbeitung, Band 1). VIII + 70 S. m. 74 Abb. Braunschweig 1969. Friedr. Vieweg & Sohn. Preis brosch. DM 16,80 , 1971 .

[6] G. Smith. ANARCHY, STATE, AND UTOPIA , 1976 .

[7] R. Nozick,et al. Anarchy, State, and Utopia , 1978 .

[8] Douglas B. Lenat,et al. EURISKO: A Program That Learns New Heuristics and Domain Concepts , 1983, Artif. Intell..

[9] C L Devito,et al. A language based on the fundamental facts of science. , 1990, Journal of the British Interplanetary Society.

[10] Timothy W. Finin,et al. Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[11] Konrad Zuse,et al. Rechnender Raum , 1991, Physik und Informatik.

[12] Timothy W. Finin,et al. Specification of the KQML Agent-Communication Language , 1993 .

[13] Timothy W. Finin,et al. KQML as an agent communication language , 1994, CIKM '94.

[14] C. Bell. Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision: DSM-IV-TR Quick Reference to the Diagnostic Criteria from DSM-IV-TR , 2001 .

[15] N. Bostrom. ARE YOU LIVING IN A COMPUTER SIMULATION ? , 2001 .

[16] G. Loewenstein,et al. Time Discounting and Time Preference: A Critical Review , 2002 .

[17] Stephen Wolfram,et al. A New Kind of Science , 2003, Artificial Life.

[18] N. Bostrom. Are We Living in a Computer Simulation , 2003 .

[19] D. Dennett. Why you can't make a computer that feels pain , 1978, Synthese.

[20] Marcus Hutter. Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[21] E. Schrödinger. Die gegenwärtige Situation in der Quantenmechanik , 1935, Naturwissenschaften.

[22] E. Schrödinger. Die gegenwärtige Situation in der Quantenmechanik , 2005, Naturwissenschaften.

[23] Polly S Nichols,et al. Agreeing to disagree. , 2005, General dentistry.

[24] Scott Aaronson,et al. The complexity of agreement , 2004, STOC '05.

[25] B. Goertzel. Potential Computational Linguistics Resources for Lojban , 2005 .

[26] Nick Bostrom. What is a Singleton , 2006 .

[27] R. Aumann. Agreeing to disagree. , 1976, Nature cell biology.

[28] Eliezer Yudkowsky. Artificial Intelligence as a Positive and Negative Factor in Global Risk , 2006 .

[29] Venu Govindaraju,et al. Behavioral Biometrics for Recognition and Verification of Game Bots , 2007, GAMEON.

[30] Peter de Blanc. Convergence of Expected Utilities with Algorithmic Probability Distributions , 2007, ArXiv.

[31] Venu Govindaraju,et al. Behavioural biometrics: a survey and classification , 2008, Int. J. Biom..

[32] Venu Govindaraju,et al. Behavioral biometrics for verification and recognition of malicious software agents , 2008, SPIE Defense + Commercial Sensing.

[33] Roman V. Yampolskiy. Behavioral Modeling: an Overview , 2008 .

[34] R.V. Yampolskiy. Detecting and Controlling Cheating in Online Poker , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[35] Stephen M. Omohundro,et al. The Basic AI Drives , 2008, AGI.

[36] Mark Bishop,et al. Why Computers Can’t Feel Pain , 2009, Minds and Machines.

[37] Peter de Blanc. Convergence of Expected Utility for Universal AI , 2009, ArXiv.

[38] Tony Stoklosa. Super intelligence , 2010, Nature.

[39] S. Kanazawa,et al. Intelligence and Substance Use , 2010 .

[40] Peter de Blanc. Ontological Crises in Artificial Agents' Value Systems , 2011, ArXiv.

[41] Roman V. Yampolskiy,et al. Evaluation of authorship attribution software on a Chat bot corpus , 2011, 2011 XXIII International Symposium on Information, Communication and Automation Technologies.

[42] Laurent Orseau,et al. Self-Modification and Mortality in Artificial Agents , 2011, AGI.

[43] Roman V. Yampolskiy,et al. What to Do with the Singularity Paradox? , 2011, PT-AI.

[44] Roman V. Yampolskiy,et al. Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach , 2011, PT-AI.

[45] Jürgen Schmidhuber,et al. A Family of Gödel Machine Implementations , 2011, AGI.

[46] Eliezer Yudkowsky,et al. Complex Value Systems in Friendly AI , 2011, AGI.

[47] Laurent Orseau,et al. Delusion, Survival, and Intelligent Agents , 2011, AGI.

[48] Daniel Dewey,et al. Learning What to Value , 2011, AGI.

[49] Anil K. Jain,et al. Face Recognition in the Virtual World: Recognizing Avatar Faces , 2012, 2012 11th International Conference on Machine Learning and Applications.

[50] Roman V Yampolskiy,et al. Artificial General Intelligence and the Human Mental Model , 2012 .

[51] Roman V. Yampolskiy,et al. AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an Artificially Intelligent System , 2012 .

[52] Luke Muehlhauser,et al. The Singularity and Machine Ethics , 2012 .

[53] Bill Hibbard,et al. Model-based Utility Functions , 2011, J. Artif. Gen. Intell..

[54] Marina L. Gavrilova,et al. Artimetrics: Biometrics for Artificial Entities , 2012, IEEE Robotics & Automation Magazine.

[55] Roman V. Yampolskiy,et al. Leakproofing the Singularity Artificial Intelligence Confinement Problem , 2012 .

[56] Amnon H. Eden,et al. Singularity Hypotheses: A Scientific and Philosophical Assessment , 2013 .

[57] Theodore Kaczynski. Industrial Society and Its Future , 2013 .

[58] Roman V. Yampolskiy,et al. Turing Test as a Defining Feature of AI-Completeness , 2013, Artificial Intelligence, Evolutionary Computing and Metaheuristics.

[59] Ευαγγελία Ρούντα. Steven D. Levitt & Stephen J. Dubner, Freakonomics: A rogue economist explores the hidden side of everything , 2015, SAS 2015.