Simultaneous motor and sensory learning for imitation

One important capability for future robots to go beyond factory assembly line into human daily life is to be able to acquire new motor skill autonomously from naïve human teachers. Recently, many new imitation learning algorithms were proposed with promising results, such as throwing a ball into a basket[1], jiggling ping-pong balls on a racket[2], and moving a tendon driven hand to tap a switch[3]. These robots typically start from blindly copying demonstrator's trajectory and then iteratively improve the trajectory with respect to certain predefined “goodness” function. We call it “scoring function.” Most of these tasks are spatial movements and thus the scoring function can be heuristically defined based on the end-effector position or system state. However, in many real life tasks, the outcome of an action is more important rather than the movement trajectory itself. A proper measurable scoring function that evaluates the “goodness” of the outcome is thus necessary. In this work, we consider the case where the scoring function is a not a trivial spatial mapping from robot movement to a scalar. For example, the softness of the sounds of a violin played by a robot with certain contact force and velocity of the bow. Another example is the happiness of a robotic face given the amount of facial servo movement. In this case, the non-trivial cost function needs to be learned either from human demonstrations prior to motor learning or from robot's own exploration during motor learning with feedback from a teacher. In particular, we develop the following learning framework as in Fig. 1. In the proposed framework, both the scoring function as well as the motor policy are learned at the same time. However, the two learning systems do not have to be fully synchronized. For example, the human teacher may decide to give the robot feedback only on some of the executions and let the robot explore by itself most of the time.

[1]  Evangelos Theodorou,et al.  Tendon-Driven Variable Impedance Control Using Reinforcement Learning , 2012, Robotics: Science and Systems.

[2]  E. Todorov,et al.  Policy gradient methods with model predictive control applied to ball bouncing , 2011 .

[3]  Aude Billard,et al.  Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.