An Action-Selection Calculus

This article describes a unifying framework for five highly influential but disparate theories of natural learning and behavioral action selection. These theories are normally considered independently, with their own experimental procedures and results. The framework presented builds on a structure of connection types, propagation rules and learning rules, which are used in combination to integrate results from each theory into a whole. These connection types and rules form the action-selection calculus. The calculus will be used to discuss the areas of genuine difference between the factor theories and to identify areas where there is overlap and where apparently disparate findings have a common source. The discussion is illustrated with exemplar experimental procedures. The article focuses on predictive or anticipatory properties inherent in these action-selection and learning theories, and uses the dynamic expectancy model and its computer implementation SRS/E as a mechanism to conduct this discussion.

[1]  Gary Jones,et al.  Production systems and rule-based inference , 2006 .

[2]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  E. Hilgard,et al.  Theories of Learning , 1981 .

[5]  R. Sutton,et al.  Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.

[6]  A. Charles Catania,et al.  The operant behaviorism of B. F. Skinner , 1984, Behavioral and Brain Sciences.

[7]  D. Thistlethwaite A critical review of latent learning and related experiments. , 1951, Psychological bulletin.

[8]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[9]  D. Whitteridge Lectures on Conditioned Reflexes , 1942, Nature.

[10]  Richard S. Sutton,et al.  Reinforcement learning architectures for animats , 1991 .

[11]  T. V. Sewards,et al.  Representations of motivational drives in mesial cortex, medial thalamus, hypothalamus and midbrain , 2003, Brain Research Bulletin.

[12]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[13]  T. Sejnowski,et al.  The Book of Hebb , 1999, Neuron.

[14]  Edgar H Vogel,et al.  Quantitative models of Pavlovian conditioning , 2004, Brain Research Bulletin.

[15]  Toby Tyrrell,et al.  An Evaluation of Maes's Bottom-Up Mechanism for Behavior Selection , 1994, Adapt. Behav..

[16]  P. Langley,et al.  Production system models of learning and development , 1987 .

[17]  G. Bi,et al.  Synaptic modification by correlated activity: Hebb's postulate revisited. , 2001, Annual review of neuroscience.

[18]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[19]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[20]  K. Nishikawa Evolutionary Convergence in Nervous Systems: Insights from Comparative Phylogenetic Studies , 2002, Brain, Behavior and Evolution.

[21]  Terrence J. Sejnowski,et al.  Predictive Hebbian learning , 1995, COLT '95.

[22]  Martin V. Butz,et al.  First Cognitive Capabilities in the Anticipatory Classifier System , 2000 .

[23]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[24]  K. Maccorquodale,et al.  Edward C. Tolman. , 1954 .

[25]  Nestor A. Schmajuk,et al.  Behavioral dynamics of escape and avoidance: a neural network approach , 1994 .

[26]  Kenneth MacCorquodale,et al.  Modern learning theory : a critical analysis of five examples , 1954 .

[27]  W. Penfield The Cerebral Cortex of Man , 1950 .

[28]  Michael Travers,et al.  Animal Construction Kits , 1987, ALIFE.

[29]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[30]  K. Maccorquodale,et al.  Preliminary suggestions as to a formalization of expectancy theory. , 1953, Psychological review.

[31]  B. Skinner,et al.  Principles of Behavior , 1944 .

[32]  William Rowan,et al.  The Study of Instinct , 1953 .

[33]  C. L. Hull Principles of Behavior , 1945 .

[34]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[35]  Christian Balkenius,et al.  Computational models of classical conditioning: a comparative study , 1998 .

[36]  W. W. Bledsoe,et al.  Review of "Problem-Solving Methods in Artificial Intelligence by Nils J. Nilsson", McGraw-Hill Pub. , 1971, SGAR.

[37]  Joanna J. Bryson,et al.  Hierarchy and Sequence vs. Full Parallelism in Action Selection , 2000 .

[38]  Mehdi Khamassi,et al.  Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..

[39]  Jean-Arcady Meyer,et al.  Adaptive Behavior , 2005 .

[40]  Max Velmans Understanding Consciousness: Second Edition , 2009 .

[41]  Pattie Maes,et al.  A bottom-up mechanism for behavior selection in an artificial creature , 1991 .

[42]  Toby Tyrell,et al.  The use of hierarchies for action selection , 1993 .

[43]  Christopher Mark Witkowski,et al.  Schemes for learning and behaviour : a new expectancy model , 2013 .

[44]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[45]  Tony J. Prescott,et al.  Modelling Natural Action Selection: Proceedings of an International Workshop , 2005 .

[46]  C. Welker Receptive fields of barrels in the somatosensory neocortex of the rat , 1976, The Journal of comparative neurology.

[47]  David S. Touretzky,et al.  Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..

[48]  Lynn Nadel,et al.  Encyclopedia of Cognitive Science , 2003 .

[49]  Sara J. Shettleworth,et al.  Reinforcement and the Organization of Behavior in Golden Hamsters: Hunger , 1975 .

[50]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[51]  T. Leergaard,et al.  Three-Dimensional Topography of Corticopontine Projections from Rat Barrel Cortex: Correlations with Corticostriatal Organization , 2000, The Journal of Neuroscience.

[52]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[53]  Mark Witkowski,et al.  Dynamic Expectancy: An Approach to Behaviour Shaping Using a New Method of Reinforcement Learning , 1998 .

[54]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[55]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[56]  John Hallam,et al.  An Ethological Model for Implementation in Mobile Robots , 1994, Adapt. Behav..

[57]  Ali A. Minai Covariance Learning of Correlated Patterns in Competitive Networks , 1997, Neural Computation.

[58]  C. H. Honzik,et al.  Degrees of hunger, reward and non-reward, and maze learning in rats, and Introduction and removal of reward, and maze performance in rats , 1930 .

[59]  Gregory Razran,et al.  Mind in evolution;: An East-West synthesis of learned behavior and cognition , 1971 .

[60]  T. Sejnowski,et al.  Storing covariance with nonlinearly interacting neurons , 1977, Journal of mathematical biology.

[61]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[62]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[63]  Derek E. Blackman,et al.  Operant Conditioning: An Experimental Analysis of Behaviour , 1974 .

[64]  James H. Capshew,et al.  B. F. Skinner: A Life , 1993 .

[65]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[66]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[67]  R. W. Schulz An introduction to theories of learning. , 1976 .

[68]  Dearborn Animal Intelligence: An Experimental Study of the Associative Processes in Animals , 1900 .

[69]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[70]  Mark Witkowski,et al.  Towards a Four Factor Theory of Anticipatory Learning , 2003, ABiALS.

[71]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[72]  Jerry Feldman Mind as Machine: A History of Cognitive Science, Margaret Boden. Oxford U. Press (2006) , 2007 .

[73]  R. Rescorla A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .

[74]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[75]  M. Boden,et al.  Mind As Machine: A History of Cognitive Science Two-Volume Set , 2006 .

[76]  Martin V. Butz,et al.  Anticipatory Behavior in Adaptive Learning Systems , 2003, Lecture Notes in Computer Science.

[77]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[78]  R. Passingham The hippocampus as a cognitive map J. O'Keefe & L. Nadel, Oxford University Press, Oxford (1978). 570 pp., £25.00 , 1979, Neuroscience.

[79]  Toby Tyrrell,et al.  Computational mechanisms for action selection , 1993 .

[80]  Philip E. Agre,et al.  Computational Research on Interaction and Agency , 1995, Artif. Intell..

[81]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[82]  Raymond J. Bandlow Theories of Learning, 4th Edition. By Ernest R. Hilgard and Gordon H. Bower. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1975 , 1976 .

[83]  T. Hafting,et al.  Microstructure of a spatial map in the entorhinal cortex , 2005, Nature.

[84]  Martin V. Butz,et al.  Internal Models and Anticipations in Adaptive Learning Systems , 2003, ABiALS.

[85]  David Kirsh,et al.  Today the Earwig, Tomorrow Man? , 1991, Artif. Intell..

[86]  Jon H Kaas,et al.  Topographic Maps are Fundamental to Sensory Processing , 1997, Brain Research Bulletin.

[87]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[88]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[89]  S. A. Barnett,et al.  The Rat: A Study in Behavior. , 1977 .

[90]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[91]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[92]  Nestor A. Schmajuk Computational models of classical conditioning , 2008, Scholarpedia.

[93]  G. Baerends The functional organization of behaviour , 1976, Animal Behaviour.

[94]  Aaron C. Courville,et al.  Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.

[95]  O. Mowrer Two-factor learning theory reconsidered, with special reference to secondary reinforcement and the concept of habit. , 1956, Psychological review.

[96]  A. Maslow Motivation and Personality , 1954 .

[97]  F. W. Irwin Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.