Multi-Agent Decision Processes for Space-Based Battle Management, Command & Control Systems

This paper will develop a theoretical foundation, identify key algorithms, and demonstrated application of some of the critical algorithm components that will make up a decentralized and distributed Battle Management, Command and Control (BMC2) systems architecture for a layered defense system. Although this paper will be focused on a space-based BMC2 system, the architecture and algorithms presented are equally useable in the air, surface, and under the surface domains. The purpose of the BMC2 systems architecture is to enable the design, development, and discovery of cooperative, multi-agent decision policies that are applicable to unrestricted, degraded, disrupted, and denied communication environments. Current technical approaches in this area tend to be heuristic. A Stochastic Games (SG) theoretical approach will be employed for modeling simultaneous, zero-sum, two team engagements involving multiple blue agents & red opponents based on Multi-Stage, Markov Stochastic Games (MSMSG). For our formulation of a two team MSMSG, we will demonstrate the solution to both the optimal heterogeneous platform engagement policy and the solution of the optimal effector action policies for multiple engagements.

[1]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[2]  J. Danskin The Theory of Max-Min and its Application to Weapons Allocation Problems , 1967 .

[3]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[4]  一樹 美添,et al.  5分で分かる! ? 有名論文ナナメ読み:Silver, D. et al. : Mastering the Game of Go without Human Knowledge , 2018 .

[5]  Michail G. Lagoudakis,et al.  Learning in Zero-Sum Team Markov Games Using Factored Value Functions , 2002, NIPS.

[6]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[7]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[8]  James Moffat,et al.  NATO NEC C2 Maturity Model , 2010 .

[9]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[10]  Kyle Y. Lin New results on a stochastic duel game with each force consisting of heterogeneous units , 2014 .

[11]  Alexander H. Levis,et al.  The Quest for a C3 Theory: Dreams and Realities, , 1987 .

[12]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[13]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[14]  Alexander H Levis C2 Architectures: The Persistent Challenge , 2013 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  John R. Boyd,et al.  The Essence of Winning and Losing , 2012 .

[17]  Paul R. Thie,et al.  Two‐Person, Zero‐Sum Games , 2011 .