The power of associative learning and the ontogeny of optimal behaviour

Behaving efficiently (optimally or near-optimally) is central to animals' adaptation to their environment. Much evolutionary biology assumes, implicitly or explicitly, that optimal behavioural strategies are genetically inherited, yet the behaviour of many animals depends crucially on learning. The question of how learning contributes to optimal behaviour is largely open. Here we propose an associative learning model that can learn optimal behaviour in a wide variety of ecologically relevant circumstances. The model learns through chaining, a term introduced by Skinner to indicate learning of behaviour sequences by linking together shorter sequences or single behaviours. Our model formalizes the concept of conditioned reinforcement (the learning process that underlies chaining) and is closely related to optimization algorithms from machine learning. Our analysis dispels the common belief that associative learning is too limited to produce ‘intelligent’ behaviour such as tool use, social learning, self-control or expectations of the future. Furthermore, the model readily accounts for both instinctual and learned aspects of behaviour, clarifying how genetic evolution and individual learning complement each other, and bridging a long-standing divide between ethology and psychology. We conclude that associative learning, supported by genetic predispositions and including the oft-neglected phenomenon of conditioned reinforcement, may suffice to explain the ontogeny of optimal behaviour in most, if not all, non-human animals. Our results establish associative learning as a more powerful optimizing mechanism than acknowledged by current opinion.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Karin Schwab Evolution And Modification Of Behavior , 2016 .

[4]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[5]  Eduardo Alonso,et al.  The application of temporal difference learning in optimal diet models. , 2014, Journal of theoretical biology.

[6]  D. Lehrman,et al.  A Critique of Konrad Lorenz's Theory of Instinctive Behavior , 1953, The Quarterly Review of Biology.

[7]  B A Williams,et al.  Conditioned Reinforcement: Experimental and Theoretical Issues , 1994, The Behavior analyst.

[8]  Michael Domjan,et al.  Ingestional Aversion Learning: Unique and General Processes , 1980 .

[9]  K. Laland,et al.  The learning of action sequences through social transmission , 2015, Animal Cognition.

[10]  Mathias Osvath,et al.  Spontaneous planning for future stone throwing by a male chimpanzee , 2009, Current Biology.

[11]  Maura L. Celli,et al.  Role of mothers in the acquisition of tool-use behaviours by captive infant chimpanzees , 2003, Animal Cognition.

[12]  Edmund Fantino,et al.  The experimental analysis of behavior : a biological perspective , 1979 .

[13]  Kevin N. Laland,et al.  Chapter 3 Social Processes Influencing Learning in Animals: A Review of the Evidence , 2008 .

[14]  N. Emery Cognition, Evolution, and Behavior Cognition, Evolution, and Behavior. 2nd edn. By Sara J. Shettleworth. Oxford: Oxford University Press (2009). Pp. xiii+700. Price $59.95 paperback. , 2010, Animal Behaviour.

[15]  C. Clark,et al.  Dynamic Modeling in Behavioral Ecology , 2019 .

[16]  A C Kamil,et al.  Performance of four seed-caching corvid species in operant tests of nonspatial and spatial memory. , 1995, Journal of comparative psychology.

[17]  Michael R. Waldmann,et al.  Causal Reasoning in Rats , 2006, Science.

[18]  P. L. Brown,et al.  Auto-shaping of the pigeon's key-peck. , 1968, Journal of the experimental analysis of behavior.

[19]  Alan C. Kamil,et al.  Optimal Foraging Theory and the Psychology of Learning , 1983 .

[20]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[21]  J. Staddon Adaptive behavior and learning , 1983 .

[22]  J. Staddon,et al.  Sequential and Temporal Properties of Behavior Induced By a Schedule of Periodic Food Delivery , 1975 .

[23]  A. Houston,et al.  Models of adaptive behaviour , 1999 .

[24]  Magnus Enquist,et al.  animal memory: a review of delayed match-to-sample data from 25 species , 2015 .

[25]  Peter Rossmanith,et al.  Simulated Annealing , 2008, Taschenbuch der Algorithmen.

[26]  P. Monaghan,et al.  Age-related differences in foraging success in the herring gull (Larus argentatus) , 1983, Animal Behaviour.

[27]  Cecilia Heyes,et al.  Simple minds: a qualified defence of associative learning , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  N. Clayton,et al.  Prospective cognition in animals , 2009, Behavioural Processes.

[29]  D. Blough Steady state data and a quantitative model of operant generalization and discrimination. , 1975 .

[30]  Marco Wiering QV(λ)-learning: A New On-policy Reinforcement Learning Algorithm , 2005 .

[31]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[32]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  John Cohen Frustration and Aggression , 1944, Nature.

[34]  William A Roberts,et al.  Are animals stuck in time? , 2002, Psychological bulletin.

[35]  Irenäus Eibl-Eibesfeldt,et al.  Angeborenes und Erworbenes im Verhalten einiger Säuger , 2010 .

[36]  T. Matsuzawa,et al.  Development of stone tool use by wild chimpanzees (Pan troglodytes). , 1997, Journal of comparative psychology.

[37]  Magnus Enquist,et al.  Corrigendum to "Coevolution of intelligence, behavioral repertoire, and lifespan" [Theoret. Popul. Biol. 91 (2014) 44–49] , 2014 .

[38]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[39]  I. Eibl-Eibesfeldt Ethology, the biology of behavior , 1970 .

[40]  Magnus Enquist,et al.  Coevolution of intelligence, behavioral repertoire, and lifespan. , 2014, Theoretical population biology.

[41]  E. Visalberghi,et al.  Tool use in Cebus. , 1990, Folia primatologica; international journal of primatology.

[42]  Dimitri P. Bertsekas,et al.  Abstract Dynamic Programming , 2013 .

[43]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[44]  Sara J. Shettleworth,et al.  Reinforcement and the organization of behavior in golden hamsters: Sunflower seed and nest paper reinforcers , 1978 .

[45]  T. J. Roper,et al.  Response of thirsty rats to absence of water: Frustration, disinhibition or compensation? , 1984, Animal Behaviour.

[46]  Esther Mondragón,et al.  Rule Learning by Rats , 2008, Science.

[47]  Klaus Zuberbuhler Species of Mind: The Philosophy and Biology of Cognitive Ethology , 1998 .

[48]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[49]  H. Roitblat,et al.  The ecology of foraging behavior: implications for animal learning and memory. , 1985, Annual review of psychology.

[50]  Michael Tomasello,et al.  Primate Cognition , 2010, Top. Cogn. Sci..

[51]  R. F. Ewer,et al.  Ethology of Mammals , 1968, Springer US.

[52]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[53]  S. Shettleworth Clever animals and killjoy explanations in comparative psychology , 2010, Trends in Cognitive Sciences.

[54]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[55]  D. McFarland Feedback mechanisms in animal behaviour , 1971 .

[56]  R. Herrnstein On the law of effect. , 1970, Journal of the experimental analysis of behavior.

[57]  E. Fischer Conditioned Reflexes , 1942, American journal of physical medicine.

[58]  S. Ghirlanda,et al.  Animal memory: A review of delayed matching-to-sample data , 2015, Behavioural Processes.

[59]  Nicola S. Clayton,et al.  Episodic Memory , 2019, Encyclopedia of Animal Cognition and Behavior.

[60]  Herbert S. Terrace,et al.  A nonverbal organism's knowledge of ordinal position in a serial learning task. , 1986 .

[61]  S. Ghirlanda,et al.  A century of generalization , 2003, Animal Behaviour.

[62]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[63]  B. Skinner The Reinforcing Effect of a Differentiating Stimulus , 1936 .

[64]  C. Barnard Behavioural Ecology: An Evolutionary Approach, 2nd edition, J.R. Krebs, N.B. Davies (Eds.). Blackwell Scientific Publications, Oxford (1984), xi , 1985 .

[65]  Carsten Dominik,et al.  The Org Mode 7 Reference Manual - Organize your life with GNU Emacs , 2010 .

[66]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[67]  A. H. Taylor,et al.  Do New Caledonian crows solve physical problems through causal reasoning? , 2009, Proceedings of the Royal Society B: Biological Sciences.

[68]  S. Shettleworth Reinforcement and the organization of behavior in golden hamsters: Hunger, environment, and food reinforcement. , 1975 .

[69]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[70]  C B Harley,et al.  When do animals learn the evolutionarily stable strategy? , 1983, Journal of theoretical biology.

[71]  Pearce Animal learning and cognition , 1997 .

[72]  Thomas Suddendorf,et al.  Foresight and Evolution of the Human Mind , 2006, Science.

[73]  Allison M. Barnard,et al.  The evolution of self-control , 2014, Proceedings of the National Academy of Sciences.

[74]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[75]  Shimon Edelman,et al.  Similarity, kernels, and the fundamental constraints on cognition , 2016 .

[76]  B F Skinner,et al.  The Extinction of Chained Reflexes. , 1934, Proceedings of the National Academy of Sciences of the United States of America.

[77]  C. Harley Learning the evolutionarily stable strategy. , 1981, Journal of theoretical biology.

[78]  C O Lovejoy Models of human evolution. , 1982, Science.

[79]  Gordon H. Orians,et al.  Age and hunting success in the brown pelican () , 1969 .

[80]  B. Williams Conditioned reinforcement: Neglected or outmoded explanatory construct? , 1994, Psychonomic bulletin & review.

[81]  Marie J. Haskell,et al.  FRUSTRATION-INDUCED AGGRESSION IN THE DOMESTIC HEN: THE EFFECT OF THWARTING ACCESS TO FOOD AND WATER ON AGGRESSIVE RESPONSES AND SUBSEQUENT APPROACH TENDENCIES , 2000 .

[82]  A. Desrochers Age and foraging success in European blackbirds: variation between and with individuals , 1992, Animal Behaviour.

[83]  Stefano Ghirlanda,et al.  On elemental and configural models of associative learning , 2015 .

[84]  Alan C. Secord,et al.  Animal Behaviour–A Synthesis of Ethology and Comparative Psychology. , 1967 .

[85]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[86]  Raymond J Carroll,et al.  Intake_epis_food(): An R Function for Fitting a Bivariate Nonlinear Measurement Error Model to Estimate Usual and Energy Intake for Episodically Consumed Foods. , 2012, Journal of statistical software.

[87]  M. Corballis,et al.  The evolution of foresight: What is mental time travel, and is it unique to humans? , 2007, The Behavioral and brain sciences.

[88]  A. R. Wagner SOP: A Model of Automatic Memory Processing in Animal Behavior , 2014 .

[89]  D. W. Hands The Matching Law: Papers In Psychology And Economics , 1999 .

[90]  Dan Davison,et al.  A Multi-Language Computing Environment for Literate Programming and Reproducible Research , 2012 .

[91]  Jerry A. Hogan,et al.  Development of Behavior Systems , 2001 .

[92]  E. L. Wike,et al.  Secondary reinforcement : selected experiments , 1966 .

[93]  W. Timberlake,et al.  Stimulus and response contingencies in the misbehavior of rats. , 1982, Journal of experimental psychology. Animal behavior processes.

[94]  Y. Niv Reinforcement learning in the brain , 2009 .

[95]  Michael Dickinson,et al.  Neuroethology , 2012, Current Opinion in Neurobiology.

[96]  D. Lendrem Modelling in Behavioural Ecology , 1986, Studies in Behavioural Adaptation.

[97]  R. Aisner,et al.  Ontogeny of pine cone opening behaviour in the black rat, Rattus rattus , 1992, Animal Behaviour.

[98]  Russell P. Balda,et al.  Coadaptations of the Clark's nutcracker and the pinon pine for efficient seed harvest and dispersal. , 1977 .

[99]  A. Kamil,et al.  Long-term spatial memory in clark's nutcracker, Nucifraga columbiana , 1992, Animal Behaviour.

[100]  Magnus Enquist,et al.  Neural networks and animal behavior , 2005 .

[101]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[102]  A. Houston,et al.  Optimal foraging and learning , 1985 .

[103]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[104]  R. Hinde Constraints on learning , 1973 .

[105]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[106]  Alasdair I. Houston,et al.  Learning rules, matching and frequency dependence , 1987 .

[107]  Herbert S. Terrace,et al.  Generalization of serial learning in the pigeon , 1981 .

[108]  K. Breland,et al.  The misbehavior of organisms. , 1961 .

[109]  C G Gross,et al.  The effects of inferior temporal and dorsolateral frontal lesions on serial-order behavior and visual imagery in monkeys. , 1993, Brain research. Cognitive brain research.

[110]  Joseph Terkel,et al.  Cultural Transmission of Feeding Behavior in the Black Rat (Rattus rattus) , 1996 .

[111]  N. Mackintosh The psychology of animal learning , 1974 .

[112]  Patricia B. Cronin Reinstatement of postresponse stimuli prior to reward in delayed-reward discrimination learning by pigeons , 1980 .

[113]  Richard S. Palais,et al.  A simple proof of the Banach contraction principle , 2007 .

[114]  Sara J. Shettleworth,et al.  CHAPTER 7 – Biological Approaches to the Study of Learning , 1994 .

[115]  R. T. Kelleher,et al.  A review of positive conditioned reinforcement. , 1962, Journal of the experimental analysis of behavior.

[116]  M. Osvath,et al.  Chimpanzee (Pan troglodytes) and orangutan (Pongo abelii) forethought: self-control and pre-experience in the face of future tool use , 2008, Animal Cognition.

[117]  A. Dickinson,et al.  Planning for the future by western scrub-jays , 2007, Nature.

[118]  Michael Hayward,et al.  Carrots and sticks: principles of animal training , 2010 .

[119]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[120]  James A. R. Marshall,et al.  Does natural selection favour the Rescorla-Wagner rule? , 2012, Journal of theoretical biology.

[121]  John Garcia,et al.  Learning with prolonged delay of reinforcement , 1966 .

[122]  L. Giraldeau,et al.  Exposing the behavioral gambit: the evolution of learning and decision rules , 2013 .