论文信息 - Learning hill-climbing functions as a strategy for generating behaviors in a mobile robot

Learning hill-climbing functions as a strategy for generating behaviors in a mobile robot

International Conference on Simulation of Adaptive Behavior Cambridge, MA: MIT Press/Bradford Books, 1991. Learning Hill-Climbing Functions as a Strategy for Generating Behaviors in a Mobile Robot David Pierce Department of Computer Sciences University of Texas at Austin Austin, TX 78712 (dmpierce@cs.utexas.edu) Benjamin Kuipers Department of Computer Sciences University of Texas at Austin Austin, TX 78712 (kuipers@cs.utexas.edu) Abstract We consider the problem of a robot with uninterpreted sensors and e ectors which must learn, in an unknown environment, behaviors (i.e., sequences of actions) which can be taken to achieve a given goal. This general problem involves a learning agent interacting with a reactive environment: the agent produces actions that a ect the environment and in turn receives sensory feedback from the environment. The agent must learn, through experimentation, behaviors that consistently achieve the goal. The di culty lies in the fact that the robot does not know a priori what its sensors mean, nor what e ects its motor apparatus has on the world. We propose a method by which the robot may analyze its sensory information in order to derive (when possible) a function de ned in terms of the sensory data which is maximized at the goal and which is suitable for hillclimbing. Given this function, the robot solves its problem by learning a behavior that maximizes the function thereby resulting in motion to the goal. 1 The credit assignment problem The learning problem addressed in this paper is illustrated in Figure 1. The learning agent, which we are calling a \critter," receives sensory input (vector s) from the world and acts on the world via motor outputs (represented by a, the action vector). In addition, the critter has access to a reward signal, r, by which it knows when it has achieved its goal. (In the experiments discussed later, the reward signal is incorporated into the sense vector for simplicity.) The critter's task is to learn a behavior which reliably achieves the goal. This behavior is a sequence of actions (most likely dependent on the concomitant sequence of sense vectors) which takes the r s a CRITTER

Benjamin Kuipers | David Pierce | B. Kuipers | David Pierce

[1] D. S. Jones,et al. Elementary information theory , 1979 .

[2] James L. McClelland. Explorations In Parallel Distributed Processing , 1988 .

[3] Benjamin Kuipers,et al. A Robust, Qualitative Method for Robot Spatial Learning , 1988, AAAI.