Learning Multidimensional Control Actions From Delayed Reinforcements

This paper addresses the problem of learning multidimensional control actions from delayed rewards. Classical reinforcement learning algorithms can be applied to tasks with multidimen-sional action spaces by recoding the action space appropriately (transforming it artiicially to a single dimension), but this straightforward recoding approach suuers from signiicant ineeciencies. An alternative approach to applying Q-learning to tasks with vector actions is proposed, called Q-V-learning. Experimental results are presented where this algorithm clearly outperforms the simple recoding approach, while it is associated with a much lower computational expense.