Markov games are a generalization of Markov decision process to a multi-agent setting. Two-player zero-sum Markov game framework offers an effective platform for designing robust controllers. This paper presents two novel controller design algorithms that use ideas from game-theory literature to produce reliable controllers that are able to maintain performance in presence of noise and parameter variations. A more widely used approach for controller design is the ∞ H optimal control, which suffers from high computational demand and at times, may be infeasible. Our approach generates an optimal control policy for the agent (controller) via a simple Linear Program enabling the controller to learn about the unknown environment. The controller is facing an unknown environment, and in our formulation this environment corresponds to the behavior rules of the noise modeled as the opponent. Proposed controller architectures attempt to improve controller reliability by a gradual mixing of algorithmic approaches drawn from the game theory literature and the Minimax-Q Markov game solution approach, in a reinforcement-learning framework. We test the proposed algorithms on a simulated Inverted Pendulum Swing-up task and compare its performance against standard Q learning. Keywords—Reinforcement learning, Markov Decision Process, Matrix Games, Markov Games, Smooth Fictitious play, Controller, Inverted Pendulum.
[1]
Eitan Altman,et al.
Zero-sum Markov games and worst-case optimal control of queueing systems
,
1995,
Queueing Syst. Theory Appl..
[2]
Derong Liu,et al.
Action-dependent adaptive critic designs
,
2001,
IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).
[3]
R.J. Williams,et al.
Reinforcement learning is direct adaptive optimal control
,
1991,
IEEE Control Systems.
[4]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..
[5]
Michael L. Littman,et al.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
,
1994,
ICML.
[6]
Audra E. Kosh,et al.
Linear Algebra and its Applications
,
1992
.
[7]
Leemon C Baird,et al.
Reinforcement Learning With High-Dimensional, Continuous Actions
,
1993
.
[8]
D. Fudenberg,et al.
Consistency and Cautious Fictitious Play
,
1995
.
[9]
Matthias Heger,et al.
Consideration of Risk in Reinforcement Learning
,
1994,
ICML.
[10]
D. Fudenberg,et al.
The Theory of Learning in Games
,
1998
.