This paper treats the multi-linked manipulator obstacle avoidance and control task as the interaction between a learning agent and an unknown environment. The role of the agent is to generate actions that maximises the reward that it receives from the environment. We demonstrate how two learning algorithms common in reinforcement learning literature- adaptive heuristic critic (AHC) (Barto et al., 1983), and Q-learning (Watkins, 1989)-can be used to solve the task successfully in two different ways: 1) through the generation of position commands to a PD controller which produces torque commands to drive the manipulator, and 2) through the direct generation of torque commands, removing the need for a PD controller. During the process, the inverse kinematics problem for multi-linked manipulators is automatically solved. Fast function approximation is achieved through the use of an array of cerebellar model arithmetic computers (CMAC). The generation of both discrete and continuous actions are investigated and the performance of the algorithms in terms of learning rates, efficiency of solutions, and memory requirements are evaluated.<<ETX>>
[1]
James S. Albus,et al.
New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1
,
1975
.
[2]
Richard S. Sutton,et al.
Neuronlike adaptive elements that can solve difficult learning control problems
,
1983,
IEEE Transactions on Systems, Man, and Cybernetics.
[3]
Richard S. Sutton,et al.
Temporal credit assignment in reinforcement learning
,
1984
.
[4]
Richard S. Sutton,et al.
Learning and Sequential Decision Making
,
1989
.
[5]
Richard S. Sutton,et al.
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
,
1990,
ML.
[6]
Eduard Aved’yan,et al.
The Cerebellar Model Articulation Controller (CMAC)
,
1995
.
[7]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..