Algorithmically identifying strategies in multi-agent game-theoretic environments

Artificial intelligence (AI) has enormous potential for military applications. Fully realizing the conceived benefits of AI requires effective interactions among Soldiers and computational agents in highly uncertain and unconstrained operational environments. Because AI can be complex and unpredictable, computational agents should support their human teammates by adapting their behavior to the human’s elected strategy for a given task, facilitating mutuallyadaptive behavior within the team. While some situations entail explicit and easy-to-understand human top-down strategies, more often than not, human strategies tend to be implicit, ad hoc, exploratory, and difficult to describe. In order to facilitate mutually-adaptive human-agent team behavior, computational teammates must identify, adapt, and modify their behaviors to support human strategies with little or no a priori experience. This challenge may be achieved by training learning agents with examples of successful group strategies. Therefore, this paper focuses on an algorithmic approach to extract group strategies from multi-agent teaming behaviors in a game-theoretic environment: predator-prey pursuit. Group strategies are illuminated with a new method inspired from Graph Theory. This method treats agents as vertices to generate a timeseries of group dynamics and analytically compares timeseries segments to identify group coordinated behaviors. Ultimately, this approach may lead to the design of agents that can recognize and fall in line with strategies implicitly adopted by human teammates. This work can provide a substantial advance to the field of humanagent teaming by facilitating natural interactions within heterogeneous teams.

[1]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[2]  Jorge Cortes,et al.  Coordinated Control of Multi-Robot Systems: A Survey , 2017 .

[3]  Boleslaw K. Szymanski,et al.  Grammatical Inference for Modeling Mobility Patterns in Networks , 2013, IEEE Transactions on Mobile Computing.

[4]  Kristin E. Schaefer,et al.  Mental Model Consensus and Shifts During Navigation System-Assisted Route Planning , 2017 .

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Nicholas R. Waytowich,et al.  Evaluating the Coordination of Agents in Multi-agent Reinforcement Learning , 2019, IHSI.

[7]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .

[8]  Elizabeth S. Veinott,et al.  An optimization approach for mapping and measuring the divergence and correspondence between paths , 2015, Behavior Research Methods.

[9]  H. A. Mallot,et al.  Planning paths to multiple targets: memory involvement and planning heuristics in spatial problem solving , 2009, Psychological research.

[10]  Patrick Laube,et al.  Analyzing Relative Motion within Groups of Trackable Moving Point Objects , 2002, GIScience.

[11]  J. Hackman Learning more by crossing levels: evidence from airplanes, hospitals, and orchestras , 2003 .

[12]  Nicholas R. Waytowich,et al.  Coordination-driven Learning in Multi-agent Problem Spaces , 2018, AAAI Fall Symposium: ALEC.

[13]  Salman Durrani,et al.  Computing Exact Closed-Form Distance Distributions in Arbitrarily Shaped Polygons with Arbitrary Reference Point , 2017 .

[14]  Nicholas R. Waytowich,et al.  Measuring collaborative emergent behavior in multi-agent reinforcement learning , 2018, IHSED.

[15]  Marco Ragni,et al.  Constraints, Inferences, and the Shortest Path: Which paths do we prefer? , 2012, CogSci.

[16]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[19]  Edzer Pebesma,et al.  dtwSat: Time-Weighted Dynamic Time Warping for Satellite Image Time Series Analysis in R , 2019, Journal of Statistical Software.

[20]  Nicolas Vayatis,et al.  ruptures: change point detection in Python , 2018, ArXiv.

[21]  Shen Li,et al.  Bayesian Inference of Temporal Task Specifications from Demonstrations , 2018, NeurIPS.

[22]  Jens Timmer,et al.  Dynamic Modeling, Parameter Estimation and Uncertainty Analysis in 𝗥 , 2016, bioRxiv.

[23]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[24]  David Gunning,et al.  DARPA's explainable artificial intelligence (XAI) program , 2019, IUI.

[25]  Redouan Bshary,et al.  Simple decision rules underlie collaborative hunting in yellow saddle goatfish , 2018, Proceedings of the Royal Society B: Biological Sciences.

[26]  Tom Lenaerts,et al.  Learning to Reach the Pareto Optimal Nash Equilibrium as a Team , 2002, Australian Joint Conference on Artificial Intelligence.

[27]  Thora Tenbrink,et al.  Conceptual layers and strategies in tour planning , 2011, Cognitive Processing.

[28]  Shane T. Mueller,et al.  Identifying Mental Models of Search in a Simulated Flight Task Using a Pathmapping Approach , 2015 .