Reinforcement learning plays a major part in the adaptive behaviour of autonomous robots. But in real-world environments reinforcement learning techniques, like Q-learning, meet great difficulties because of rapidly growing search spaces. We explored the characteristics of discrete Q-learning in a high-dimensional continuous setup consisting of a simulated robot grasping task. Very simple sensors in this setup allow only a rather coarse identification of the actual “physical” state, thus leading to consequences known as “perceptual aliasing”. We identified parameters — especially the sensory sampling-rate — directly controlling the grade of generality of the policies to be learned. Actually in case of the more general policies performing the raw positioning effects of ambiguity can be suppressed. So the system can find feasible grasping positions after few explorative actions and we suggest, that a neural net supporting Q-learning could improve the overall performance significantly.
[1]
Sridhar Mahadevan,et al.
Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture
,
1991,
ML.
[2]
Richard S. Sutton,et al.
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
,
1990,
ML.
[3]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..
[4]
Helge Ritter,et al.
Learning to recognize 3D-Hand Postures from Perspective Pixel Images
,
1992
.
[5]
Helge Ritter,et al.
Reinforcement Learning and Subtasks
,
1992
.
[6]
Dana H. Ballard,et al.
Active Perception and Reinforcement Learning
,
1990,
Neural Computation.