Space Debris Removal: Learning to Cooperate and the Price of Anarchy

In this paper we study space debris removal from a game-theoretic perspective. In particular we focus on the question whether and how self-interested agents can cooperate in this dilemma, which resembles a tragedy of the commons scenario. We compare centralised and decentralised solutions and the corresponding price of anarchy, which measures the extent to which competition approximates cooperation. In addition we investigate whether agents can learn optimal strategies by reinforcement learning. To this end, we improve on an existing high fidelity orbital simulator, and use this simulator to obtain a computationally efficient surrogate model that can be used for our subsequent game-theoretic analysis. We study both single- and multi-agent approaches using stochastic (Markov) games and reinforcement learning. The main finding is that the cost of a decentralised, competitive solution can be significant, which should be taken into consideration when forming debris removal strategies.

[1]  Alessandro Rossi,et al.  A quantitative evaluation of the environmental impact of the mega constellations , 2017 .

[2]  Dario Izzo,et al.  Game Theoretic Analysis of the Space Debris Dilemma Final Report , 2016 .

[3]  Olli Tahvonen,et al.  Carbon dioxide abatement as a differential game , 1994 .

[4]  G. Hardin,et al.  Tragedy of the Commons , 1968 .

[5]  James S. Farrior,et al.  Guidance And Control , 1962 .

[6]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[7]  Tim Agi Carrico,et al.  Investigating Orbital Debris Events Using Numerical Methods with Full Force Model Orbit Propagation , 2008 .

[8]  J. Liou An active debris removal parametric study for LEO environment remediation , 2011 .

[9]  N. Johnson,et al.  Instability of the Present LEO Satellite Populations , 2008 .

[10]  N. Johnson,et al.  NASA's new breakup model of evolve 4.0 , 2001 .

[11]  N. Johnson,et al.  THE KESSLER SYNDROME: IMPLICATIONS TO FUTURE SPACE OPERATIONS , 2010 .

[12]  J. Nash NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[13]  T. Roughgarden,et al.  Algorithmic Game Theory: Introduction to the Inefficiency of Equilibria , 2007 .

[14]  A. Diekmann Volunteer's Dilemma , 1985 .

[15]  Vincent A. Knight,et al.  Measuring the price of anarchy in critical care unit interactions , 2017, J. Oper. Res. Soc..

[16]  Tim Roughgarden,et al.  Selfish routing and the price of anarchy , 2005 .

[17]  Dario Izzo,et al.  Space Debris Removal: A Game Theoretic Analysis , 2016, ECAI.

[18]  Tim Roughgarden,et al.  The Price of Anarchy in Auctions , 2016, J. Artif. Intell. Res..

[19]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[20]  Hugh G. Lewis,et al.  The fast debris evolution model , 2009 .

[21]  N. Johnson,et al.  Risks in Space from Orbiting Debris , 2006, Science.

[22]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[23]  Hugh G. Lewis,et al.  STABILITY OF THE FUTURE LEO ENVIRONMENT – AN IADC COMPARISON STUDY , 2013 .

[24]  M. Safari Price of Anarchy ⋆ , 2005 .

[25]  M. J. Matney,et al.  A New Approach to Evaluate Collision Probabilities Among Asteroids, Comets,and Kuiper Belt Objects , 2003 .

[26]  D. Kessler,et al.  Collision frequency of artificial satellites: The creation of a debris belt , 1978 .

[27]  Dario Izzo,et al.  FOR MASSIVELY PARALLEL OPTIMIZATION IN ASTRODYNAMICS ( THE CASE OF INTERPLANETARY TRAJECTORY OPTIMIZATION ) , 2012 .

[28]  Toby Walsh,et al.  Online Fair Division: Analysing a Food Bank Problem , 2015, IJCAI.

[29]  H. Klinkrad,et al.  The ESA space debris mitigation handbook , 1997 .

[30]  H. Klinkrad,et al.  The ESA Space Debris Mitigation Handbook 2002 , 2002 .

[31]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[32]  B. Harstad,et al.  Climate Contracts: A Game of Emissions, Investments, Negotiations, and Renegotiations , 2012 .

[33]  Alessandro Rossi,et al.  Effect of mitigation measures on the long-term evolution of the debris population , 2001 .

[34]  J.-C. Liou,et al.  A sensitivity study of the effectiveness of active debris removal in LEO , 2009 .

[35]  Yoav Shoham,et al.  Essentials of Game Theory: A Concise Multidisciplinary Introduction , 2008, Essentials of Game Theory: A Concise Multidisciplinary Introduction.

[36]  Pradeep Dubey,et al.  Inefficiency of Nash Equilibria , 1986, Math. Oper. Res..

[37]  N. Johnson,et al.  Planetary science. Risks in space from orbiting debris. , 2006, Science.

[38]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[39]  Dario Izzo,et al.  Evolving Solutions to TSP Variants for Active Space Debris Removal , 2015, GECCO.

[40]  T. S. Kelso,et al.  Revisiting Spacetrack Report #3 , 2006 .

[41]  Hugh G. Lewis,et al.  Synergy of debris mitigation and removal , 2012 .

[42]  Alessandro Rossi,et al.  Space debris , 2011, Scholarpedia.

[43]  J.-C. Liou,et al.  Controlling the growth of future LEO debris populations with active debris removal , 2010 .

[44]  Christos H. Papadimitriou,et al.  Worst-case equilibria , 1999 .

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Dario Izzo,et al.  Space Debris Removal: A Game Theoretic Analysis , 2016, Games.

[47]  Nicholas L. Johnson,et al.  Space Debris Environment Remediation Concepts , 2009 .

[48]  Attila Szolnoki,et al.  Statistical Physics of Human Cooperation , 2017, ArXiv.

[49]  Leonard J. Mirman,et al.  The great fish war: an example using a dynamic Cournot-Nash solution , 2020, Fisheries Economics.

[50]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[51]  Matjaz Perc,et al.  Directional learning and the provisioning of public goods , 2015, Scientific Reports.

[52]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.