Machine learning for developmental robotics

Developmental robotics is the process whereby a robot incrementally acquires more and more complex cognitive skills. This approach draws inspiration from biology to tackle the ultimate goal of robotics, i.e. intelligent robust machines operating in open ended environments. The main idea is to equip the robot with a set of predefined (pre-programmed) skills and then follow learning processes to acquire new ones on top of current knowledge. Acquiring new skills requires combining many learning process such as unsupervised self-exploration, learning by observation or reinforcement learning. The success of the approach also depends heavily on the models used by the robot to represent and use this knowledge. In this work we discuss which machine learning methods and models are and can be used to create a robot that develops autonomously. We discuss a possible developmental pathway whereby a robot acquires the capability to learn by imitation. It is composed by three levels: 1) sensory-motor coordination; 2) world interaction; and 3) imitation. At each stage the system learns more about its own body and about the world. The newly acquired knowledge enables and facilitates the learning at the next level. We focus our work on two main problems related to the world interaction and imitation phases respectively: learning the properties and dynamics of objects (affordances) and inferring task descriptions from observations (imitation). Affordances represent the behaviour of objects in terms of the robot’s motor and perceptual skills. This type of knowledge plays a crucial role in developmental robotics, since it is at the core of many higher level skills such as imitation. In our work, we propose a general affordance model based on Bayesian networks. This model describes inter-relations between actions, object features and observable effects. The robot learns the structure and parameters of the network by interacting with different objects. Knowledge of the world in turn enables social interaction. The nature of this third phase is very different from the previous ones. It requires the robot to interact with a teacher which in turn provides supervision or reinforcement. We develop an imitation learning methodology for a humanoid robot that uses the general world model acquired previously to infer the task to be learnt from the teacher’s demonstrations. The core of our algorithm is the recently proposed Bayesian inverse reinforcement learning algorithm. The challenges are to reuse a general task independent model and to estimate the appropriate rewards/policies. The proposed framework gives rise to several important issues and future directions for research. For instance, generalizing robot-object interaction knowledge requires to take into account groups of objects, sequences of actions and delayed effects. Active learning strategies should be implemented to deal with huge search spaces. Another important point is the interaction between the different learning processes, e.g the evolution of actions from pure joint positions or velocities to (possibly parameterized) motion primitives. Finally, it is important to perceive the impact of inaccurate learnt models in subsequent steps. For example, in the imitation stage, errors in the recognition of the demonstration may affect the learning of the demonstrated task.

[1]  José Santos-Victor,et al.  A Developmental Roadmap for Learning by Imitation in Robots , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Manuel Lopes,et al.  Affordance-based imitation learning in robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.