Finding Correlated Equilibria in General Sum Stochastic Games

Often problems arise where multiple self-interested agents with individual goals can coordinate their actions to improve their outcomes. We model these problems as general sum stochastic games. We develop a tractable approximation algorithm for computing subgame-perfect correlated equilibria in these games. Our algorithm is an extension of standard dynamic programming methods like value iteration and Q-learning. And, it is conservative: while it is not guaranteed to find all value vectors achievable in correlated equilibrium, any policy which it does find is guaranteed to be an exact equilibrium of the stochastic game (to within limits of accuracy which depend on the number of backups and not on the approximation scheme). Our new algorithm is based on the planning algorithm of [1]. That algorithm computes subgame-perfect Nash equilibria, but assumes that it is given a set of “punishment policies” as input. Our new algorithm requires only the description of the game, an important improvement since suitable punishment policies may be difficult to come by.