Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics

A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a specific action at each moment of time. The contribution of the rewards in the past to the agent current perception of action value is described by an integral operator with a power-law kernel. Finally a fractional differential equation governing the system dynamics is obtained. The agents are considered to interact with one another implicitly via the reward of one agent depending on the choice of the other agents. The pairwise interaction model is adopted to describe this effect. As a specific example of systems with non-transitive interactions, a two agent and three agent systems of the rock-paper-scissors type are analyzed in detail, including the stability analysis and numerical simulation. Scale-free memory is demonstrated to cause complex dynamics of the systems at hand. In particular, it is shown that there can be simultaneously two modes of the system instability undergoing subcritical and supercritical bifurcation, with the latter one exhibiting anomalous oscillations with the amplitude and period growing with time. Besides, the instability onset via this supercritical mode may be regarded as “altruism self-organization”. For the three agent system the instability dynamics is found to be rather irregular and can be composed of alternate fragments of oscillations different in their properties.

[1]  B. Sinervo,et al.  The rock–paper–scissors game and the evolution of alternative male strategies , 1996, Nature.

[2]  T. Uller,et al.  Parental effects in ecology and evolution: mechanisms, processes and implications , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  V. V. Gafiychuk,et al.  Inhomogeneous oscillatory solutions in fractional reaction-diffusion systems and their computer modeling , 2008, Appl. Math. Comput..

[4]  Margaret A. Riley,et al.  Antibiotic-mediated antagonism leads to a bacterial game of rock–paper–scissors in vivo , 2004, Nature.

[5]  Drew Fudenberg,et al.  Repeated Games with Long-run and Short-run Players , 1990 .

[6]  L. Buss,et al.  Competitive Networks: Nontransitive Competitive Relationships in Cryptic Coral Reef Environments , 1979, The American Naturalist.

[7]  J. M. Smith Disruptive Selection, Polymorphism and Sympatric Speciation , 1962, Nature.

[8]  B. Sinervo,et al.  Female choice for optimal combinations of multiple male display traits increases offspring survival , 2009 .

[9]  Frederick Mosteller,et al.  Stochastic Models for Learning , 1956 .

[10]  Zecchina,et al.  Statistical mechanics of systems with heterogeneous agents: minority games , 1999, Physical review letters.

[11]  P. Garber,et al.  Role of spatial memory in primate foraging patterns: Saguinus mystax and Saguinus fuscicollis , 1989, American journal of primatology.

[12]  Alison R. Davis,et al.  Selective loss of polymorphic mating types is associated with rapid phenotypic evolution during morphic speciation , 2010, Proceedings of the National Academy of Sciences.

[13]  R. Hertwig,et al.  The description–experience gap in risky choice , 2009, Trends in Cognitive Sciences.

[14]  J. Weltzin,et al.  Can community composition be predicted from pairwise species interactions? , 2008, Plant Ecology.

[15]  V. Jansen,et al.  Altruism through beard chromodynamics , 2006, Nature.

[16]  S. Goldhor Ecology , 1964, The Yale Journal of Biology and Medicine.

[17]  Garrahan,et al.  Continuous time dynamics of the thermal minority game , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[18]  C. Hauert,et al.  Game theory and physics , 2005 .

[19]  M. Koganezawa,et al.  Memory effects on scale-free dynamics in foraging Drosophila. , 2009, Journal of theoretical biology.

[20]  D. Emlen,et al.  Two Thresholds, Three Male Forms Result in Facultative Male Trimorphism in Beetles , 2009, Science.

[21]  D. Helbing Traffic and related self-driven many-particle systems , 2000, cond-mat/0012229.

[22]  Constance M. Smith,et al.  Genetic polymorphism for alternative mating behaviour in lekking male ruff Philomachus pugnax , 1995, Nature.

[23]  B. Sinervo,et al.  The Developmental, Physiological, Neural, and Genetical Causes and Consequences of Frequency-Dependent Selection in the Wild , 2006 .

[24]  A. De Martino Dynamics of multi-frequency minority games , 2003 .

[25]  Robert A. Johnson Learning, Memory, and Foraging Efficiency in Two Species of Desert Seed‐Harvester Ants , 1991 .

[26]  M. Wade,et al.  Female copying and sexual selection in a marine isopod crustacean, Paracerceis sculpta , 1991, Animal Behaviour.

[27]  V. Gafiychuk,et al.  Mathematical modeling of time fractional reaction-diffusion systems , 2008 .

[28]  I. Lubashevsky,et al.  Physics of systems with motivation as an interdisciplinary field of science , 2009, 0902.3785.

[29]  E. Erhart,et al.  Spatial Memory during Foraging in Prosimian Primates: Propithecus edwardsi and Eulemur fulvus rufus , 2008, Folia Primatologica.

[30]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[31]  G. Lynch,et al.  The neurobiology of learning and memory , 1989, Cognition.

[32]  Michael T. Turvey,et al.  Human memory retrieval as Lévy foraging , 2007 .

[33]  Jaime G. Carbonell,et al.  Machine learning research , 1981, SGAR.

[34]  S. MacDonald,et al.  Spatial memory and foraging competition in captive western lowland gorillas (Gorilla gorilla gorilla) , 2000, Primates.

[35]  B. Sinervo,et al.  Dorsal cortex volume in male side-blotched lizards, Uta stansburiana, is associated with different space use strategies , 2009, Animal Behaviour.

[36]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[37]  J. Mckinnon,et al.  Linking color polymorphism maintenance and speciation. , 2007, Trends in ecology & evolution.

[38]  S. Fortunato,et al.  Statistical physics of social dynamics , 2007, 0710.3256.

[39]  Rosario N. Mantegna,et al.  Book Review: An Introduction to Econophysics, Correlations, and Complexity in Finance, N. Rosario, H. Mantegna, and H. E. Stanley, Cambridge University Press, Cambridge, 2000. , 2000 .

[40]  Kazuko Yamasaki,et al.  Scaling and memory of intraday volatility return intervals in stock markets. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Multi-species interactions in competitive hierarchies: New methods and empirical test , 2007 .

[42]  L. Keller,et al.  The evolution of cooperation and altruism – a general framework and a classification of models , 2006, Journal of evolutionary biology.

[43]  James P. Crutchfield,et al.  Stability and diversity in collective adaptation , 2004, nlin/0408039.

[44]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[45]  J. Crutchfield,et al.  Coupled replicator equations for the dynamics of learning in multiagent systems. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Sarah Mae Sincero Heredity , 1875, Nature.

[47]  Timothy J. Pleskac,et al.  The Description-Experience Gap in Risky Choice: The Role of Sample Size and Experienced Probabilities , 2008 .

[48]  R. Trivers The Evolution of Reciprocal Altruism , 1971, The Quarterly Review of Biology.

[49]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[50]  O. Huguenin-Elie,et al.  Evenness drives consistent diversity effects in intensive grassland systems across 28 European sites , 2007 .

[51]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[52]  N. Ford,et al.  Pitfalls in fast numerical solvers for fractional differential equations , 2006 .

[53]  Barry Sinervo,et al.  Discrete genetic variation in mate choice and a condition-dependent preference function in the side-blotched lizard: implications for the formation and maintenance of coadapted gene complexes , 2007 .

[54]  T. W. Fawcett,et al.  Previous experiences shape adaptive mate preferences , 2009 .

[55]  R. Mantegna,et al.  An Introduction to Econophysics: Contents , 1999 .

[56]  M. West-Eberhard,et al.  Alternative adaptations, speciation, and phylogeny (A Review). , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[57]  M. Feldman,et al.  Local dispersal promotes biodiversity in a real-life game of rock–paper–scissors , 2002, Nature.

[58]  P. Hill,et al.  Learning and Memory During Foraging of The Blue Orchard Bee, Osmia lignaria Say (Hymenoptera: Megachilidae) , 2008 .

[59]  C. Paquin,et al.  Relative fitness can decrease in evolving asexual populations of S. cerevisiae , 1983, Nature.

[60]  Drew Fudenberg,et al.  Learning in Games , 1998 .

[61]  L. Squire Memory systems of the brain: A brief history and current perspective , 2004, Neurobiology of Learning and Memory.

[62]  Andrea Cavagna,et al.  THERMAL MODEL FOR ADAPTIVE COMPETITION IN A MARKET , 1999 .

[63]  B. Sinervo,et al.  The evolution of alternative reproductive strategies: fitness differential, heritability, and genetic correlation between the sexes. , 2001, The Journal of heredity.

[64]  Roberto Garrappa,et al.  Explicit methods for fractional differential equations and their stability properties , 2009 .

[65]  B. Sinervo,et al.  Correlational selection and the evolution of genomic architecture , 2002, Heredity.

[66]  S. Griffith,et al.  Red dominates black: agonistic signalling among head morphs in the colour polymorphic Gouldian finch , 2006, Proceedings of the Royal Society B: Biological Sciences.

[67]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[68]  O. Ronce How Does It Feel to Be Like a Rolling Stone? Ten Questions About Dispersal Evolution , 2007 .

[69]  F. Widemo,et al.  Alternative reproductive strategies in the ruff, Philomachus pugnax : a mixed ESS? , 1998, Animal Behaviour.

[70]  A. Lüscher,et al.  Diversity-interaction modeling: estimating contributions of species identities and interactions to ecosystem function. , 2009, Ecology.

[71]  P H Harvey,et al.  THE NATAL AND BREEDING DISPERSAL OF BIRDS , 1982 .

[72]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[73]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[74]  C. Mettke-Hofmann,et al.  Long-term memory for a life on the move , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[75]  B. Doligez,et al.  ‘Heritability’ of dispersal propensity in a patchy population , 2009, Proceedings of the Royal Society B: Biological Sciences.

[76]  A. Kamil,et al.  Long-term spatial memory in clark's nutcracker, Nucifraga columbiana , 1992, Animal Behaviour.

[77]  H. Srivastava,et al.  Theory and Applications of Fractional Differential Equations , 2006 .

[78]  D. Fudenberg,et al.  The Folk Theorem for Repeated Games with Discounting and Incomplete Information , 1998 .

[79]  Kazuko Yamasaki,et al.  Scaling and memory in volatility return intervals in financial markets. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Aram Galstyan,et al.  Continuous strategy replicator dynamics for multi-agent Q-learning , 2009, Autonomous Agents and Multi-Agent Systems.

[81]  Jean Clobert,et al.  Hormones, developmental plasticity and adaptation , 2002 .

[82]  A. Tversky,et al.  Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[83]  M. Wade,et al.  Equal mating success among male reproductive strategies in a marine isopod , 1991, Nature.

[84]  A. Gardner,et al.  Altruism, Spite, and Greenbeards , 2010, Science.

[85]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[86]  M Marsili,et al.  Continuum time limit and stationary states of the minority game. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[87]  Ihor Lubashevsky,et al.  Mathematical formalism of physics of systems with motivation , 2009, 0908.1217.

[88]  Tobias Uller,et al.  Developmental plasticity and the evolution of parental effects. , 2008, Trends in ecology & evolution.

[89]  David M. Ramsey,et al.  Learning rules for optimal selection in a varying environment: mate choice revisited , 2006 .