The skinner automaton: A psychological model formalizing the theory of operant conditioning

Operant conditioning is one of the fundamental mechanisms of animal learning, which suggests that the behavior of all animals, from protists to humans, is guided by its consequences. We present a new stochastic learning automaton called a Skinner automaton that is a psychological model for formalizing the theory of operant conditioning. We identify animal operant learning with a thermodynamic process, and derive a so-called Skinner algorithm from Monte Carlo method as well as Metropolis algorithm and simulated annealing. Under certain conditions, we prove that the Skinner automaton is expedient, ɛ-optimal, optimal, and that the operant probabilities converge to the set of stable roots with probability of 1. The Skinner automaton enables machines to autonomously learn in an animal-like way.

[1]  B. Skinner,et al.  Science and human behavior , 1953 .

[2]  Peter Vrancx,et al.  Generalized learning automata for multi-agent reinforcement learning , 2010, AI Commun..

[3]  Paolo Dario,et al.  Behavior model of humanoid robots based on operant conditioning , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[4]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[5]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[6]  Emil M. Popa,et al.  A new evolutionary reinforcement scheme for stochastic learning automata , 2008 .

[7]  S. Lakshmivarahan,et al.  Absolutely Expedient Learning Algorithms For Stochastic Automata , 1973 .

[8]  B. John Oommen,et al.  Discretized estimator learning automata , 1992, IEEE Trans. Syst. Man Cybern..

[9]  Donald T. Haynie,et al.  The frontier of biological thermodynamics , 2001 .

[10]  Mandayam A. L. Thathachar,et al.  Learning the global maximum with parameterized learning automata , 1995, IEEE Trans. Neural Networks.

[11]  V. Braitenberg Vehicles, Experiments in Synthetic Psychology , 1984 .

[12]  David S. Touretzky,et al.  Operant Conditioning in Skinnerbots , 1997, Adapt. Behav..

[13]  Stephen Grossberg,et al.  Classical and Instrumental Learning by Neural Networks , 1982 .

[14]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[15]  Pierre-Yves Oudeyer,et al.  R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[16]  J. Watson Psychology As The Behaviorist Views It , 2011 .

[17]  Mardi J. Horowitz,et al.  Introduction to Psychodynamics: A New Synthesis , 1988 .

[18]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[19]  Augustus Desire Waller Lectures on physiology , 2014 .

[20]  David S. Touretzky,et al.  Operant behavior suggests attentional gating of dopamine system inputs , 2001, Neurocomputing.

[21]  Masaki Ogino,et al.  Cognitive Developmental Robotics: A Survey , 2009, IEEE Transactions on Autonomous Mental Development.

[22]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[23]  S Grossberg,et al.  On the dynamics of operant conditioning. , 1971, Journal of theoretical biology.

[24]  A. Poznyak,et al.  On nonlinear reinforcement schemes , 1997, IEEE Trans. Autom. Control..

[25]  R. M. Elliott,et al.  Behavior of Organisms , 1991 .

[26]  M. A. L. THATHACHAR,et al.  A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[27]  Emil M. Popa,et al.  A nonlinear reinforcement scheme for stochastic learning automata , 2006 .

[28]  William L. Jorgensen Perspective on “Equation of state calculations by fast computing machines” , 2000 .

[29]  Robert Rosen,et al.  Progress in Theoretical Biology , 2012 .

[30]  P. S. Sastry,et al.  Varieties of learning automata: an overview , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[31]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Norbert Wiener,et al.  Cybernetics: Control and Communication in the Animal and the Machine. , 1949 .

[34]  David S. Touretzky,et al.  Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..

[35]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[36]  Mandayam A. L. Thathachar,et al.  Local and Global Optimization Algorithms for Generalized Learning Automata , 1995, Neural Computation.

[37]  E. Thorndike Animal Intelligence; Experimental Studies , 2009 .

[38]  Sergiu-Mihai Dascalu,et al.  Virtual Neurorobotics (VNR) to Accelerate Development of Plausible Neuromorphic Brain Architectures , 2007, Frontiers in neurorobotics.

[39]  A. Takanishi,et al.  Development of face robot to express various face shapes by moving the parts and outline , 2008, 2008 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.

[40]  Paolo Gaudiano,et al.  Application of Biological Learning Theories to Mobile Robot Avoidance and Approach Behaviors , 1998, Adv. Complex Syst..

[41]  B. Skinner Superstition in the pigeon. , 1948, Journal of experimental psychology.

[42]  I. Pavlov Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex , 1929 .