Multi-Agent Residual Advantage Learning with General Function Approximation.
暂无分享,去创建一个
Abstract : A new algorithm advantage learning, is presented that improves on advantage updating by requiring that a single function be learned rather than two. Furthermore, advantage learning requires only a single type of update, the learning, while advantage updating requires two different types of updates, a learning update and a normalization update. The reinforcement learning system uses the residual form of advantage learning. An application of reinforcement learning to a Markov game is presented. The test-bed has continuous states and nonlinear dynamics. The advantage function is stored in a single-hidden-layer sigmoidal network. Speed of learning is increased by a new algorithm, Incremental Delta-Delta (IDD), which extends Jacob's (1988) Delta-Delta for use in incremental training, and differs from Sutton's Incremental Delta-Bar-Delta (1992) in that it does not require the use of a trace and is amenable for use with general function approximation systems. To our knowledge, this is the first time an approximate second order method has been used with residual algorithms. Empirical results are presented comparing convergence rates with and without the use of lDD for the reinforcement learning test-bed and for a supervised learning test-bed.