Decomposition techniques for Markov zero-sum games with nested information

Markov zero-sum games arise in applications such as network interdiction, where an informed defender protects a network against attacks. This problem has received significant attention in recent years due to its relevance to military problems and network security. In this paper, we focus on finite games where the attacker knows imperfectly the network state, and formulate this as a Markov game with nested information. By exploiting the nested information structure, we decompose the multistage game into a sequence of one-stage subgames and develop an algorithm that computes the value of the game and the saddle point strategies for the game. This decomposition method computes the value of the game using backward induction as in stochastic dynamic programming, then identifies saddle-point strategies that achieve this value. Using the Markov structure of the game, we show that the value of the game can be computed efficiently in terms of a single value function of an information state at each stage. The resulting single stage optimization problems are much smaller than the original multistage game. We illustrate our results with an example of multistage network interdiction where the attacker may not be able to observe outcomes of the attacks.

[1]  Gerald G. Brown,et al.  Defending Critical Infrastructure , 2006, Interfaces.

[2]  David A. Castañón,et al.  Dynamic network interdiction games with imperfect information and deception , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[3]  D. Koller,et al.  The complexity of two-person zero-sum games in extensive form , 1992 .

[4]  David P. Morton,et al.  Stochastic Network Interdiction , 1998, Oper. Res..

[5]  R. K. Wood,et al.  Bilevel Network Interdiction Models: Formulations and Solutions , 2011 .

[6]  R. Kevin Wood,et al.  Deterministic network interdiction , 1993 .

[7]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[8]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[9]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .

[10]  H. W. Corley,et al.  Finding the n Most Vital Nodes in a Flow Network , 1974 .

[11]  David P. Morton,et al.  Models for nuclear smuggling interdiction , 2007 .

[12]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[13]  D. Castañón,et al.  Equilibrium solutions in sequential stochastic games , 1977 .

[14]  J. Salmeron,et al.  Analysis of electric grid security under terrorist threat , 2004, IEEE Transactions on Power Systems.

[15]  Richard D. Wollmer,et al.  Removing Arcs from a Network , 1964 .

[16]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[17]  Delbert Ray Fulkerson,et al.  Maximizing the minimum source-sink path subject to a budget constraint , 1977, Math. Program..

[18]  David A. Castañón,et al.  Stochastic dynamic network interdiction games , 2012, 2012 American Control Conference (ACC).

[19]  D. Aberdeen,et al.  A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .

[20]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.