Simple Artificial Neural Networks That Match Probability and Exploit and Explore When Confronting a Multiarmed Bandit

The matching law (Herrnstein 1961) states that response rates become proportional to reinforcement rates; this is related to the empirical phenomenon called probability matching (Vulkan 2000). Here, we show that a simple artificial neural network generates responses consistent with probability matching. This behavior was then used to create an operant procedure for network learning. We use the multiarmed bandit (Gittins 1989), a classic problem of choice behavior, to illustrate that operant training balances exploiting the bandit arm expected to pay off most frequently with exploring other arms. Perceptrons provide a medium for relating results from neural networks, genetic algorithms, animal learning, contingency theory, reinforcement learning, and theories of choice.

[1]  W. Estes,et al.  Analysis of a verbal conditioning situation in terms of statistical learning theory , 1954 .

[2]  M. Dawson,et al.  Minds and Machines: Connectionism and Psychological Modeling , 2003 .

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  N. Longo PROBABILITY-LEARNING AND HABIT-REVERSAL IN THE COCKROACH. , 1964, The American journal of psychology.

[5]  M. A. L. THATHACHAR,et al.  A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Richard J. Herrnstein,et al.  Derivatives of Matching. , 1979 .

[7]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[8]  D. Danks Equilibria of the Rescorla--Wagner model , 2003 .

[9]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[10]  M. Kalish,et al.  Connectionism: A Hands-On Approach, Michael R.W. Dawson. Blackwell (2005), £50.00 (hbk)/£19.99 (pbk), (200 pp.), ISBN: 1 405 13074 1 (hbk)/1 405 12807 0 , 2006 .

[11]  M. Dawson,et al.  Connectionism and Classical Conditioning , 2008 .

[12]  Nir Vulkan An Economist's Perspective on Probability Matching , 2000 .

[13]  R. Herrnstein,et al.  Toward a law of response strength. , 1976 .

[14]  Isaac Meilijson,et al.  Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors , 2002, Adapt. Behav..

[15]  Massimo Piattelli-Palmarini,et al.  Evolution, selection and cognition: From “learning” to parameter setting in biology and in the study of language , 1989, Cognition.

[16]  M. Bitterman,et al.  FURTHER EXPERIMENTS ON PROBABILITY-MATCHING IN THE PIGEON. , 1964, Journal of the experimental analysis of behavior.

[17]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[18]  Michael R. W. Dawson,et al.  Connectionist Selectionism: A Case Study of Parity , 2005 .

[19]  Zahra Ansari,et al.  The Quantitative Law of Effect is a Robust Emergent Property of an Evolutionary Algorithm for Reinforcement Learning , 2005, ECAL.

[20]  Michael R. W. Dawson,et al.  Autonomous processing in parallel distributed processing networks , 1992 .

[21]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[22]  J J McDowell,et al.  On the classic and modern theories of matching. , 2005, Journal of the experimental analysis of behavior.

[23]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[24]  M. E. Bitterman,et al.  Probability-Matching in the Fish , 1961 .

[25]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[26]  D. W. Hands The Matching Law: Papers In Psychology And Economics , 1999 .

[27]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[28]  J. J. McDowell,et al.  Undermatching is an emergent property of selection by consequences , 2007, Behavioural Processes.

[29]  David C. Palmer,et al.  Learning and Complex Behavior , 1993 .

[30]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[31]  J. J. McDowell,et al.  A computational theory of adaptive behavior based on an evolutionary reinforcement mechanism , 2006, GECCO.

[32]  M. Davison,et al.  The matching law: A research review. , 1988 .

[33]  M. Bitterman,et al.  Choice in honeybees as a function of the probability of reward , 1993 .

[34]  J J McDowell,et al.  A computational model of selection by consequences. , 2004, Journal of the experimental analysis of behavior.

[35]  R. Herrnstein,et al.  The Matching Law Papers in Psychology and Economics , 1997 .

[36]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[37]  M E BITTERMAN,et al.  Probability-Learning by the Turtle , 1965, Science.

[38]  N. Newcombe,et al.  Is there a geometric module for spatial orientation? squaring theory and evidence , 2005, Psychonomic bulletin & review.

[39]  Richard J. Herrnstein,et al.  MAXIMIZING AND MATCHING ON CONCURRENT RATIO SCHEDULES1 , 1975 .

[40]  Tamar Keasar,et al.  Bees in two-armed bandit situations: foraging choices and possible decision mechanisms , 2002 .

[41]  R. Herrnstein,et al.  Maximizing and matching on concurrent ratio schedules. , 1975, Journal of the experimental analysis of behavior.

[42]  David R. Shanks,et al.  The Psychology of Associative Learning , 1995 .

[43]  R. Herrnstein,et al.  Toward a law of response strength. , 1976 .

[44]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[45]  W M Baum,et al.  On two types of deviation from the matching law: bias and undermatching. , 1974, Journal of the experimental analysis of behavior.