Deep reinforcement learning in World-Earth system models to discover sustainable management strategies

Increasingly complex nonlinear World-Earth system models are used for describing the dynamics of the biophysical Earth system and the socioeconomic and sociocultural World of human societies and their interactions. Identifying pathways toward a sustainable future in these models for informing policymakers and the wider public, e.g., pathways leading to robust mitigation of dangerous anthropogenic climate change, is a challenging and widely investigated task in the field of climate research and broader Earth system science. This problem is particularly difficult when constraints on avoiding transgressions of planetary boundaries and social foundations need to be taken into account. In this work, we propose to combine recently developed machine learning techniques, namely, deep reinforcement learning (DRL), with classical analysis of trajectories in the World-Earth system. Based on the concept of the agent-environment interface, we develop an agent that is generally able to act and learn in variable manageable environment models of the Earth system. We demonstrate the potential of our framework by applying DRL algorithms to two stylized World-Earth system models. Conceptually, we explore thereby the feasibility of finding novel global governance policies leading into a safe and just operating space constrained by certain planetary and socioeconomic boundaries. The artificially intelligent agent learns that the timing of a specific mix of taxing carbon emissions and subsidies on renewables is of crucial relevance for finding World-Earth system trajectories that are sustainable in the long term.

[1]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[2]  J. Heitzig,et al.  A Thought Experiment on Sustainable Management of the Earth System , 2018, Sustainability.

[3]  Emilie Lindkvist,et al.  Strategies for sustainable management of renewable resources during environmental change , 2017, Proceedings of the Royal Society B: Biological Sciences.

[4]  M. Scheffer,et al.  Trajectories of the Earth System in the Anthropocene , 2018, Proceedings of the National Academy of Sciences.

[5]  Wolfgang Lucht,et al.  Closing the loop: Reconnecting human dynamics to Earth System science , 2017 .

[6]  D. L. Kelly,et al.  Integrated Assessment Models For Climate Change Control∗ , 1998 .

[7]  Guillaume Deffuant,et al.  Viability and Resilience of Complex Systems , 2011 .

[8]  Liang Wang,et al.  Climate modification directed by control theory , 2008, ArXiv.

[9]  J. Kurths,et al.  When optimization for governing human-environment tipping elements is neither sustainable nor safe , 2018, Nature Communications.

[10]  Wolfgang Lucht,et al.  Macroscopic description of complex adaptive networks coevolving with dynamic node states. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  J. Norberg,et al.  Modeling experiential learning: The challenges posed by threshold dynamics for sustainable renewable resource management , 2014 .

[12]  Claudia Pahl-Wostl,et al.  Models at the interface between science and society: impacts and options , 2000 .

[13]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[14]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[15]  Jonathan F. Donges,et al.  Topology of sustainable management of dynamical systems with desirable states : from defining planetary boundaries to safe operating spaces in the Earth system , 2015 .

[16]  Arslan Munir,et al.  Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger , 2017, ArXiv.

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Marco Wiering,et al.  Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.

[20]  Jonathan F. Donges,et al.  Towards representing human behavior and decision making in Earth system models – an overview of techniques and approaches , 2017 .

[21]  J. Heitzig,et al.  Self-enforcing strategies to deter free-riding in the climate change mitigation game and other repeated public good games , 2011, Proceedings of the National Academy of Sciences.

[22]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[23]  Wolfgang Lucht,et al.  Tipping elements in the Earth's climate system , 2008, Proceedings of the National Academy of Sciences.

[24]  von der Osten,et al.  Intelligent decision-making in coupled socio-ecological systems , 2017 .

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[27]  Yuxi Li,et al.  Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.

[28]  S. Carpenter,et al.  Planetary boundaries: Guiding human development on a changing planet , 2015, Science.

[29]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[30]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[31]  I. C. Prentice,et al.  Evaluation of ecosystem dynamics, plant geography and terrestrial carbon cycling in the LPJ dynamic global vegetation model , 2003 .

[32]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[33]  H. Schellnhuber Tipping elements in the Earth System , 2009, Proceedings of the National Academy of Sciences.

[34]  R. Pindyck The Use and Misuse of Models for Climate Policy , 2015, Review of Environmental Economics and Policy.

[35]  F. Chapin,et al.  A safe operating space for humanity , 2009, Nature.

[36]  W. Brian Arthur,et al.  On designing economic agents that behave like human agents , 1993 .

[37]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[38]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[39]  Wolfgang Lucht,et al.  Sustainable use of renewable resources in a stylized social–ecological network model under heterogeneous resource distribution , 2016 .

[40]  H. J. Schellnhuber,et al.  ‘Earth system’ analysis and the second Copernican revolution , 1999, Nature.

[41]  Cezar Ionescu,et al.  The impact of uncertainty on optimal emission policies , 2017 .

[42]  Wolfgang Lucht,et al.  Earth system modelling with complex dynamic human societies: the copan:CORE World-Earth modeling framework , 2018 .

[43]  Will Steffen,et al.  The topology of non-linear global carbon dynamics: from tipping points to planetary boundaries , 2013 .

[44]  Richard N. Zare,et al.  Optimizing Chemical Reactions with Deep Reinforcement Learning , 2017, ACS central science.

[45]  Kate Raworth,et al.  A Safe and Just Space for Humanity: Can we live within the doughnut? , 2012 .

[46]  Ulrich Parlitz,et al.  Sustainability, collapse and oscillations in a simple World-Earth model , 2017 .

[47]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[48]  Jürgen Kurths,et al.  Deterministic limit of temporal difference reinforcement learning for stochastic games , 2018, Physical review. E.

[49]  F. Chapin,et al.  Planetary boundaries: Exploring the safe operating space for humanity , 2009 .