Towards solving 2-TBSG efficiently

Two-player turn-based stochastic game (2-TBSG) is a two-player game model which aims to find Nash equilibriums and is widely utilized in reinforcement learning and AI. Inspired by the fact that the simplex method for solving the deterministic discounted Markov decision processes is strongly polynomial independent of the discount factor, we are trying to answer an open problem whether there is a similar algorithm for 2-TBSG. We develop a simplex strategy iteration where one player updates its strategy with a simplex step while the other player finds an optimal counterstrategy in turn, and a modified simplex strategy iteration. Both of them belong to a class of geometrically converging algorithms. We establish the strongly polynomial property of these algorithms by considering a strategy combined from the current strategy and the equilibrium strategy. Moreover, we present a method to transform general 2-TBSGs into special 2-TBSGs where each state has exactly two actions.

[1]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[2]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  Yinyu Ye,et al.  The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..

[7]  John Fearnley,et al.  Exponential Lower Bounds for Policy Iteration , 2010, ICALP.

[8]  Oliver Friedmann,et al.  An Exponential Lower Bound for the Parity Game Strategy Improvement Algorithm as We Know it , 2009, 2009 24th Annual IEEE Symposium on Logic In Computer Science.

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Rahul Savani,et al.  A Simple P-Matrix Linear Complementarity Problem for Discounted Games , 2008, CiE.

[11]  Singiresu S. Rao,et al.  Algorithms for discounted stochastic games , 1973 .

[12]  Peter Bro Miltersen,et al.  Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.

[13]  Yinyu Ye,et al.  The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes , 2012, Math. Oper. Res..

[14]  Bruno Scherrer,et al.  Improved and Generalized Upper Bounds on the Complexity of Policy Iteration , 2013, Math. Oper. Res..

[15]  Yinyu Ye,et al.  A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[16]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[17]  Sylvain Sorin,et al.  Stochastic Games and Applications , 2003 .

[18]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.