Evolutionary Reinforcement Learning Dynamics with Irreducible Environmental Uncertainty

In this work we derive and present evolutionary reinforcement learning dynamics in which the agents are irreducibly uncertain about the current state of the environment. We evaluate the dynamics across different classes of partially observable agent-environment systems and find that irreducible environmental uncertainty can lead to better learning outcomes faster, stabilize the learning process and overcome social dilemmas. However, as expected, we do also find that partial observability may cause worse learning outcomes, for example, in the form of a catastrophic limit cycle. Compared to fully observant agents, learning with irreducible environmental uncertainty often requires more exploration and less weight on future rewards to obtain the best learning outcomes. Furthermore, we find a range of dynamical effects induced by partial observability, e.g., a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions. The presented dynamics are a practical tool for researchers in biology, social science and machine learning to systematically investigate the evolutionary effects of environmental uncertainty.

[1]  Attila Szolnoki,et al.  Punishment and inspection for governing the commons in a feedback-evolving game , 2018, PLoS Comput. Biol..

[2]  Tobias Galla,et al.  Intrinsic noise in game dynamical learning. , 2009, Physical review letters.

[3]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[4]  M. Common,et al.  Natural resource and environmental economics , 1996 .

[5]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[6]  Tobias Galla,et al.  Complex dynamics in learning complicated games , 2011, Proceedings of the National Academy of Sciences.

[7]  Juan C. Burguillo,et al.  Timing Uncertainty in Collective Risk Dilemmas Encourages Group Reciprocation and Polarization , 2020, iScience.

[8]  D. Blackwell Equivalent Comparisons of Experiments , 1953 .

[9]  R. Dukas,et al.  Cognitive ecology II , 2009 .

[10]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[11]  Martin A. Nowak,et al.  Evolutionary dynamics with game transitions , 2019, Proceedings of the National Academy of Sciences.

[12]  Carl T. Bergstrom,et al.  Stewardship of global collective behavior , 2021, Proceedings of the National Academy of Sciences.

[13]  J. Kurths,et al.  When optimization for governing human-environment tipping elements is neither sustainable nor safe , 2018, Nature Communications.

[14]  ZhangChi,et al.  Taming the uncertainty , 2016 .

[15]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[16]  Sam P. Brown,et al.  An oscillating tragedy of the commons in replicator dynamics with game-environment feedback , 2016, Proceedings of the National Academy of Sciences.

[17]  Alex McAvoy,et al.  Asymmetric evolutionary games with environmental feedback. , 2018, Journal of theoretical biology.

[18]  B. Kendall Nonlinear Dynamics and Chaos , 2001 .

[19]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[20]  Jürgen Kurths,et al.  Deterministic limit of temporal difference reinforcement learning for stochastic games , 2018, Physical review. E.

[21]  Aram Galstyan,et al.  Continuous strategy replicator dynamics for multi-agent Q-learning , 2009, Autonomous Agents and Multi-Agent Systems.

[22]  Donald D Hoffman,et al.  Natural selection and veridical perceptions. , 2010, Journal of theoretical biology.

[23]  C. Hauert,et al.  Evolutionary games and population dynamics: maintenance of cooperation in public goods games , 2006, Proceedings of the Royal Society B: Biological Sciences.

[24]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[25]  Wolfram Barfuss,et al.  Deep reinforcement learning in World-Earth system models to discover sustainable management strategies , 2019, Chaos.

[26]  Donald D. Hoffman,et al.  The Interface Theory of Perception , 2015, Psychonomic bulletin & review.

[27]  Tobias Galla,et al.  Fixation in finite populations evolving in fluctuating environments , 2014, Journal of The Royal Society Interface.

[28]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[29]  Georgios Piliouras,et al.  Game dynamics as the meaning of a game , 2019, SECO.

[30]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[31]  Attila Szolnoki,et al.  Environmental feedback drives cooperation in spatial social dilemmas , 2017, ArXiv.

[32]  Karl Tuyls,et al.  Evolutionary Dynamics of Regret Minimization , 2010, ECML/PKDD.

[33]  Wolfgang Lucht,et al.  Sustainable use of renewable resources in a stylized social–ecological network model under heterogeneous resource distribution , 2016 .

[34]  Linda B. Smith Cognition as a dynamic system: Principles from embodiment , 2005 .

[35]  M. Nowak,et al.  Evolution of cooperation in stochastic games , 2018, Nature.

[36]  Dirk Helbing,et al.  Adding noise to the institution: an experimental welfare investigation of the contribution-based grouping mechanism , 2017, Social Choice and Welfare.

[37]  J. Crutchfield,et al.  Coupled replicator equations for the dynamics of learning in multiagent systems. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  R. Mann Optimal use of simplified social information in sequential decision-making , 2021, bioRxiv.

[39]  Wolfram Barfuss,et al.  Reinforcement Learning Dynamics in the Infinite Memory Limit , 2020, AAMAS.

[40]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[41]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[42]  Charles D. Kolstad,et al.  Systematic uncertainty in self-enforcing international environmental agreements , 2007 .

[43]  Chaitanya S. Gokhale,et al.  Eco-evolutionary dynamics of social dilemmas. , 2015, Theoretical population biology.

[44]  J. Kurths,et al.  The physics of governance networks: critical transitions in contagion dynamics on multilayer adaptive networks with application to the sustainable use of renewable resources , 2019, The European Physical Journal Special Topics.

[45]  Marcello Restelli,et al.  Evolutionary Dynamics of Q-Learning over the Sequence Form , 2014, AAAI.

[46]  S. Shettleworth Cognition, evolution, and behavior , 1998 .

[47]  P. Kollock SOCIAL DILEMMAS: The Anatomy of Cooperation , 1998 .

[48]  J. Norberg,et al.  Modeling experiential learning: The challenges posed by threshold dynamics for sustainable renewable resource management , 2014 .

[49]  Kyle D Stephens,et al.  Fitness Beats Truth in the Evolution of Perception , 2020, Acta Biotheoretica.

[50]  Jonathan P. How,et al.  Decision Making Under Uncertainty: Theory and Application , 2015 .

[51]  C. Kolstad,et al.  Uncertainty, Learning and Heterogeneity in International Environmental Agreements , 2011 .

[52]  Wolfram Barfuss,et al.  Dynamical systems as a level of cognitive analysis of multi-agent learning , 2021, Neural Computing and Applications.

[53]  Erol Akçay,et al.  Evolutionary games with environmental feedbacks , 2018, Nature Communications.

[54]  P. Todd,et al.  Ecological Rationality: Intelligence in the World , 2012 .

[55]  Jochem Marotzke,et al.  The collective-risk social dilemma and the prevention of simulated dangerous climate change , 2008, Proceedings of the National Academy of Sciences.

[56]  Satinder P. Singh,et al.  Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.

[57]  Jürgen Kurths,et al.  Caring for the future can turn tragedy into comedy for long-term collective action under risk of collapse , 2020, Proceedings of the National Academy of Sciences.

[58]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.

[59]  J. Ponssard,et al.  The values of information in some nonzero sum games , 1977 .

[60]  Gerd Gigerenzer,et al.  Heuristic decision making. , 2011, Annual review of psychology.

[61]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[62]  Scott Barrett,et al.  Climate negotiations under scientific uncertainty , 2012, Proceedings of the National Academy of Sciences.

[63]  Lenka Zdeborová,et al.  Understanding deep learning is also a job for physicists , 2020, Nature Physics.

[64]  Olivier Buffet,et al.  Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing , 2020, ICML.

[65]  M. Nowak Five Rules for the Evolution of Cooperation , 2006, Science.

[66]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[67]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[68]  Maja Schlüter,et al.  The Survival of the Conformist: Social Pressure and Renewable Resource Management , 2010, Journal of theoretical biology.

[69]  A. Dannenberg,et al.  Tipping versus Cooperating to Supply a Public Good , 2015, SSRN Electronic Journal.

[70]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[71]  Elijah Roberts,et al.  Cooperation dilemma in finite populations under fluctuating environments. , 2013, Physical review letters.

[72]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[73]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[74]  Adib Bagh,et al.  On the Economic Value of Signals , 2019, The B.E. Journal of Theoretical Economics.

[75]  Y. Moreno,et al.  Cooperation in changing environments: Irreversibility in the transition to cooperation in complex networks , 2013, 1308.1133.

[76]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[77]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[78]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[79]  Joseph Y. Halpern Reasoning about uncertainty , 2003 .

[80]  Vincent Marchau,et al.  Decision Making under Deep Uncertainty: From Theory to Practice , 2019 .