This paper shows how an omnidirectional robot can learn to correct inaccuracies when driving, or even learn to use corrective motor commands when a motor fails, whether partially or completely. Driving inaccuracies are unavoidable, since not all wheels have the same grip on the surface, or not all motors can provide exactly the same power. When a robot starts driving, the real system response differs from the ideal behavior assumed by the control software. Also, malfunctioning motors are a fact of life that we have to take into account. Our approach is to let the control software learn how the robot reacts to instructions sent from the control computer. We use a neural network, or a linear model for learning the robot’s response to the commands. The model can be used to predict deviations from the desired path, and take corrective action in advance, thus increasing the driving accuracy of the robot. The model can also be used to monitor the robot and assess if it is performing according to its learned response function. If it is not, the new response function of the malfunctioning robot can be learned and updated. We show, that even if a robot loses power from a motor, the system can re-learn to drive the robot in a straight path, even if the robot is a black-box and we are not aware of how the commands are applied internally. 1 Motivation: Robots are Imprecise The FU-Fighters participated in RoboCup 2004 in Lisbon. The robots had a combination of motors and electronics. The motors performed satisfactorily until we started losing some of them. We drive the motors above their specification, as almost all teams do, and it can happen, during a long tournament, that motors get damaged. Our omnidirectional robots have four wheels, so that the remaining three motors could still be used, but the robot oscillated wildly because the PID controller tried to operate with the four motors. Therefore, a natural question to ask is whether the high-level control can observe the problem through the computer vision and take corrective action. We would like to send the right commands without having to modify the robot, and without changing the PID controller in the robot’s electronics. −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 No correction Fig. 1. An omnidirectional robot (left) receives commands for driving in a star-shaped path starting from the origin (right). The robot keeps its orientation towards the north. The robot has difficulties driving straight along the diagonals. A related problem is that of driving a robot with an imperfect chassis. When the wheels and motors are mounted, it can happen that one wheel finishes having an extra millimeter or two of axis length. More frequently, the motors themselves are different. Some are older motors, some are new. The older motors do not provide exactly the same torque as the newer motors, for the same PWM signal. The PID controller takes care of equalizing the motor speeds, but when the robot starts and before the PID controller can become active, the differences can be high enough to send the robot in a slightly different direction. The difference in the grip of the wheels on the floor can exacerbate the problem. It is difficult to adjust the PID controller to cope with all eventualities, also because such differences are dynamic. Fig. 1 shows how a robot drives when trying to move very fast along a star-shaped path. The inaccuracies are larger along the diagonals, but are present in all eight directions. The first solution which comes to mind to the problems mentioned above is that of having a perfect physical model of the robot, which can be used to predict the robot’s behavior whenever something happens (for example, a motor delivers only half of the torque, or an axis is 3 mm longer than the others, etc.). However, it is difficult to derive a good analytic approximation of the real robot. In addition, since the robots are different we would need one model for each robot. The solution we propose here is to let the vision system learn the behavior of the robot “on the fly”, from observations collected by the cameras. The learned robot response is as good as an analytical model if it can be used to predict the behavior of the robot to the next command. We can then anticipate if the robot will perform according to our wishes or not, and take corrective action in advance. Think of a soldier who mixes up right with left. You give the command ”turn right”, and the soldier turns left. After two or three such experiences, you just order the soldier to turn left, when it should go to the right, and vice versa – the soldier behaves now as we would like. Moreover, we propose to dynamically retrain the robot’s behavior predictor, so that whenever the system observes that the current predictor is not accurate enough, a new predictor is computed. Using the new predictor, we can again anticipate what a malfunctioning robot will do with the next command, and then take corrective action. The work described here takes corrective actions purely by software. The robot learns to heal itself. Related work has been done on methods to cope with differences in the hardware platform, as described by Kleiner, but at a higher behavioral level [9]. The authors teach slightly different robots to optimally shoot a ball. Reinforcement learning is used to learn the correct activation of the behavior needed. In a previous paper, we described how to apply learning algorithms to optimize the PID controller needed for driving an omnidirectional robot [7]. This is learning applied to the lowest hardware level. There is a general interest in the issue of fault tolerant architectures for robotic control. Parallel control architectures can be used [8]. Self-repairing strategies for 3D motion planning have been investigated [5], and also for production systems [10]. Eventually, self-repairing robots will be built. 2 Learning the robots behavior We started applying predictors for robot behavior when our robots became too fast for the existing system delay. A robot driving at 2 m/s can move 20 cm in 100 ms. This could be the difference between stopping just in front of another robot or colliding with it. Another advantage of learning the behavior of robots in response to commands, is that the predictor can be used in a simulator. In our control system loop the only external physical sensors used for behavior control are two video cameras (the motors on the robots have pulse counters, but this information is not available to the off-the-field controlling computer). The global computer vision system analyzes the video images and produces as output the positions and orientations of the robots and ball. Our adaptive vision is described in detail in [11, 4]. The data available for behavior control comes from the past. The frame captured at time t takes a certain time to get to the computer and be processed. Once the video image has been processed, it takes some time for the wireless command to reach the robot, to be decoded, and executed. All of this elapsed time is the system delay, which in our small-size platform varies between 100 to 150 ms. To cope with the system delay, we let a neural network or linear associator learn the correspondence between past positions and commands sent to the robot (for example, during the last previously seen six frames) and the future positions (from one to four frames in advance). This data is collected just by driving the robot around the field and logging all positions and commands. −90 −80 −70 −60 −50 50 55 60 65 70 75 80 85 90 95
[1]
Sven Behnke,et al.
Predicting away the Delay
,
2003
.
[2]
Raúl Rojas,et al.
Learning to Drive and Simulate Autonomous Mobile Robots
,
2004,
RoboCup.
[3]
Sven Behnke,et al.
A Hierarchy of Reactive Behaviors Handles Complexity
,
2000,
Balancing Reactivity and Social Deliberation in Multi-Agent Systems.
[4]
Alexander H. Jackson,et al.
Robot fault-tolerance using an embryonic array
,
2003,
NASA/DoD Conference on Evolvable Hardware, 2003. Proceedings..
[5]
Sven Behnke,et al.
Predicting Away Robot Control Latency
,
2003,
RoboCup.
[6]
Sven Behnke,et al.
Robust Real Time Color Tracking
,
2000,
RoboCup.
[7]
I. Praehofer,et al.
Supervising manufacturing system operation by DEVS-based intelligent control
,
1994,
Fifth Annual Conference on AI, and Planning in High Autonomy Systems.
[8]
Bernhard Nebel,et al.
Towards a Life-Long Learning Soccer Agent
,
2002,
RoboCup.