论文信息 - Attritable multi-agent learning

Attritable multi-agent learning

Autonomous systems will operate in highly contested environments in which it must be assumed that adversaries are equally capable, agile and informed. To achieve and sustain dominant performance in such environments, autonomous systems must be able to adapt through online machine learning while managing and tolerating attrition - that is, improve their performance quickly, even over the duration of a single engagement with principled asset losses. However, there are novel challenges to adapting effectively in such environments. We present an approach that leverages several recent innovations in reinforcement learning, distributed computing and trusted consensus algorithms such as Blockchain. We note that multi-agent systems operating in contested environments must leverage their redundancy for learning while also remaining resilient with respect to component failures and com- promises. In particular, to enable and accelerate learning, such systems will have to allow some number of components to operate sub-optimally to achieve the right exploration-exploitation balance needed for rapid and effective learning. At the same time that some number of components are possibly being sacrificed due to sub-optimal performance, the underlying mission of the system must be maintained. This leads to challenges in distributed trusted computing such as Byzantine agreement problems. Simulations demonstrating these various tradeoffs using epidemiological models are presented.

George Cybenko | Roger A. Hallman

[1] Kartik Nayak,et al. Flexible Byzantine Fault Tolerance , 2019, CCS.

[2] H P Young,et al. On the impossibility of predicting the behavior of rational agents , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3] Yehuda Levy. Limits to rational learning , 2015, J. Econ. Theory.

[4] Jörg Liesen,et al. Mathematical Analysis and Algorithms for Federated Byzantine Agreement Systems , 2019, ArXiv.

[5] P. Kaye. Infectious diseases of humans: Dynamics and control , 1993 .

[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[7] Jonathan Katz,et al. Byzantine Agreement with a Rational Adversary , 2012, ICALP.

[8] Eizo Akiyama,et al. Chaos in learning a simple two-person game , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10] J. Boyd,et al. A Discourse on Winning and Losing , 1987 .

[11] Adam Michael Campbell,et al. Enabling tactical autonomy for unmanned surface vehicles in defensive swarm engagements , 2018 .

[12] Michael P. Wellman,et al. A Stackelberg Game Model for Botnet Data Exfiltration , 2017, GameSec.