论文信息 - Learning and forgetting for perception-action: a projection pursuit and density adaptive approach

Learning and forgetting for perception-action: a projection pursuit and density adaptive approach

We study learning of perception-action relations using visually-driven grasping as an example task. The well-established technique of non-parametric Projection Pursuit Regression (PPR) is used to accomplish reinforcement learning by searching for projections of high-dimensional data sets that capture invariants in the distribution of reinforcement in the parameter-space. The variable resolution 2$\sp{k}$-tree, a generalized quadtree, is used to represent perception-action maps based on the resulting reinforcement regression function. We also pursue the following problem: how can we use human expertise and insight into grasping to train a system to select gripper approach directions and orientations for grasping, and then have it verify and adapt its skills through trial and error? To accomplish this learning we develop a new Density Adaptive Reinforcement Learning algorithm. This algorithm uses statistical tests to identify regions of the attribute space in which the dynamics of the task change and the density of exemplars is high. It concentrates the building of high-resolution descriptions in those areas. In order to adapt the default rules to those necessary for the robot, it is necessary for the system to be able to forget previous experiences that no longer reflect the behavior of the world. A general purpose Density Adaptive forgetting algorithm has been developed that can be used as a front-end for a variety of learning methods. Additionally, by setting the forgetting parameters appropriately, an upper bound on the number of exemplars stored in the system may also be selected. This is important since all memory-based learning systems have finite memory in practice. The approach is verified through simulation and experimentation. A robotic system incorporating two robots with a gripper, compliant instrumented wrist, arm, camera and laser scanner is used for experimentation. Since trial and error learning processes imply that failures will occur, the mechanics of the untrained robotic system must be able to tolerate mistakes during learning and not be damaged by excessive forces. We address this by the use of an instrumented, compliant robot wrist that controls impact forces.

Marcos Salganicoff