论文信息 - Simple Artificial Neural Networks That Match Probability and Exploit and Explore When Confronting a Multiarmed Bandit

Simple Artificial Neural Networks That Match Probability and Exploit and Explore When Confronting a Multiarmed Bandit

The matching law (Herrnstein 1961) states that response rates become proportional to reinforcement rates; this is related to the empirical phenomenon called probability matching (Vulkan 2000). Here, we show that a simple artificial neural network generates responses consistent with probability matching. This behavior was then used to create an operant procedure for network learning. We use the multiarmed bandit (Gittins 1989), a classic problem of choice behavior, to illustrate that operant training balances exploiting the bandit arm expected to pay off most frequently with exploring other arms. Perceptrons provide a medium for relating results from neural networks, genetic algorithms, animal learning, contingency theory, reinforcement learning, and theories of choice.

[1] W. Estes,et al. Analysis of a verbal conditioning situation in terms of statistical learning theory , 1954 .

[2] M. Dawson,et al. Minds and Machines: Connectionism and Psychological Modeling , 2003 .

[3] A. A. Mullin,et al. Principles of neurodynamics , 1962 .

[4] N. Longo. PROBABILITY-LEARNING AND HABIT-REVERSAL IN THE COCKROACH. , 1964, The American journal of psychology.

[5] M. A. L. THATHACHAR,et al. A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[6] Richard J. Herrnstein,et al. Derivatives of Matching. , 1979 .

[7] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.

[8] D. Danks. Equilibria of the Rescorla--Wagner model , 2003 .

[9] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .

[10] M. Kalish,et al. Connectionism: A Hands-On Approach, Michael R.W. Dawson. Blackwell (2005), £50.00 (hbk)/£19.99 (pbk), (200 pp.), ISBN: 1 405 13074 1 (hbk)/1 405 12807 0 , 2006 .

[11] M. Dawson,et al. Connectionism and Classical Conditioning , 2008 .

[12] Nir Vulkan. An Economist's Perspective on Probability Matching , 2000 .

[13] R. Herrnstein,et al. Toward a law of response strength. , 1976 .

[14] Isaac Meilijson,et al. Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors , 2002, Adapt. Behav..

[15] Massimo Piattelli-Palmarini,et al. Evolution, selection and cognition: From “learning” to parameter setting in biology and in the study of language , 1989, Cognition.

[16] M. Bitterman,et al. FURTHER EXPERIMENTS ON PROBABILITY-MATCHING IN THE PIGEON. , 1964, Journal of the experimental analysis of behavior.

[17] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[18] Michael R. W. Dawson,et al. Connectionist Selectionism: A Case Study of Parity , 2005 .

[19] Zahra Ansari,et al. The Quantitative Law of Effect is a Robust Emergent Property of an Evolutionary Algorithm for Reinforcement Learning , 2005, ECAL.

[20] Michael R. W. Dawson,et al. Autonomous processing in parallel distributed processing networks , 1992 .

[21] Kevin D. Glazebrook,et al. Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[22] J J McDowell,et al. On the classic and modern theories of matching. , 2005, Journal of the experimental analysis of behavior.

[23] R J HERRNSTEIN,et al. Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[24] M. E. Bitterman,et al. Probability-Matching in the Fish , 1961 .

[25] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[26] D. W. Hands. The Matching Law: Papers In Psychology And Economics , 1999 .

[27] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[28] J. J. McDowell,et al. Undermatching is an emergent property of selection by consequences , 2007, Behavioural Processes.

[29] David C. Palmer,et al. Learning and Complex Behavior , 1993 .

[30] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[31] J. J. McDowell,et al. A computational theory of adaptive behavior based on an evolutionary reinforcement mechanism , 2006, GECCO.

[32] M. Davison,et al. The matching law: A research review. , 1988 .

[33] M. Bitterman,et al. Choice in honeybees as a function of the probability of reward , 1993 .

[34] J J McDowell,et al. A computational model of selection by consequences. , 2004, Journal of the experimental analysis of behavior.

[35] R. Herrnstein,et al. The Matching Law Papers in Psychology and Economics , 1997 .

[36] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[37] M E BITTERMAN,et al. Probability-Learning by the Turtle , 1965, Science.

[38] N. Newcombe,et al. Is there a geometric module for spatial orientation? squaring theory and evidence , 2005, Psychonomic bulletin & review.

[39] Richard J. Herrnstein,et al. MAXIMIZING AND MATCHING ON CONCURRENT RATIO SCHEDULES1 , 1975 .

[40] Tamar Keasar,et al. Bees in two-armed bandit situations: foraging choices and possible decision mechanisms , 2002 .

[41] R. Herrnstein,et al. Maximizing and matching on concurrent ratio schedules. , 1975, Journal of the experimental analysis of behavior.

[42] David R. Shanks,et al. The Psychology of Associative Learning , 1995 .

[43] R. Herrnstein,et al. Toward a law of response strength. , 1976 .

[44] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[45] W M Baum,et al. On two types of deviation from the matching law: bias and undermatching. , 1974, Journal of the experimental analysis of behavior.