论文信息 - Learning Multidimensional Control Actions From Delayed Reinforcements

Learning Multidimensional Control Actions From Delayed Reinforcements

This paper addresses the problem of learning multidimensional control actions from delayed rewards. Classical reinforcement learning algorithms can be applied to tasks with multidimen-sional action spaces by recoding the action space appropriately (transforming it artiicially to a single dimension), but this straightforward recoding approach suuers from signiicant ineeciencies. An alternative approach to applying Q-learning to tasks with vector actions is proposed, called Q-V-learning. Experimental results are presented where this algorithm clearly outperforms the simple recoding approach, while it is associated with a much lower computational expense.

Paweł Cichosz

[1] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[2] C. Watkins. Learning from delayed rewards , 1989 .

[3] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.

[4] Paweł Cichosz,et al. Reinforcement Learning Algorithms Based on the Methods of Temporal Differences , 1994 .

[5] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td for Reinforcement Learning , 1995 .