Out in the World: What Did The Robot Hear AndSee?

It is now well established that joint attention is a key capability for socially interacting robots (Brooks et al., 1999, Kaplan and Hafner, 2004, Scassellati, 1999, Itti, 2003). It is also a key component for epigenetic robotic applications in general. This subject has been widely discussed and we present here one specific technical improvement for joint attention which relies on image segmentation. In the new Talking Robots (Baillie, 2004) experiment that we have started, following a successful reimplementation of the Sony’s Talking Heads (Steels, 1998) experiment on Aibo ERS7, we try to have two robots inter- acting to evolve a shared repertoire of syn- chronized behaviors, or “games”. This leads to a dynamic version of the Talk- ing Heads, where the interaction protocol, or language game, is not predefined in the agents. As usually with experiments involving symbol grounding and social interaction, and for these experiments in particular, it is es- sential that the two robots can establish joint attention and share a common representa- tion of their surrounding environment, while they are looking at the same scene. Many techniques can be used to achieve this stable shared representation. We have chosen to fo- cus on segmentation-based algorithms, which provide interesting shapes together with vec- tors of features that can be easily extracted and proved useful in the context of our exper- iment. However, when using segmentation algo- rithms, a slight change in the viewpoint, or even the residual camera noise, is enough to significantly change the result of the image partition, possibly leading to completely dif- ferent perceptions of the same scene. We have developed an original measure to assess the stability of a segmentation algo- rithm and we have used it on a set of algo- rithms to automatically determine the most stable partition for a given scene. This ap- proach is different from classical methods used to estimate the quality of a segmentation al- gorithm, where the result of the algorithm is compared to an ideal perfect segmentation done by hand. In our approach, the measure is done automatically, involves only stability considerations and could lead to interesting improvements whenever joint attention using segmentation is required. We quickly present in the poster the back- ground of the Talking Robots experiment and why joint attention and image segmentation stability is an important issue for us. We in- troduce then two stability measures and show some results on natural scenes from our ex- periments in the lab and test scenes used to control image parameters. The influence of several image characterizations (noise, num- ber of objects, luminosity,...) is carefully re- viewed. An example (fig.1) is given below, showing the influence of noise on typical im- ages. Using the fact that a given algorithm can be ranked according to its stability score, which is calculated online assuming that the scene itself is static, a general method of algorithm switching is introduced and used in the ex- periment with different kind of algorithms: region growing, recursive histogram splitting, CSC (Priese and Rehrmann, 1993) and split & merge (CIS). We show how this method significantly improves the convergence speed in the experiment and conclude on the gen- erality of our approach to facilitate certain aspects of joint attention, when it relies on segmentation.

[1]  Jeff Weber,et al.  MERTZ: a quest for a robust and scalable active vision humanoid head robot , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[2]  Luc Steels,et al.  The Origins of Syntax in Visually Grounded Robotic Agents , 1997, IJCAI.

[3]  B. Scassellati Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot , 1999 .

[4]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[5]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lutz Priese,et al.  A Fast Hybrid Color Segmentation Method , 1993, DAGM-Symposium.

[7]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[8]  Allen Allport,et al.  Visual attention , 1989 .

[9]  F. Kaplan,et al.  The challenges of joint attention , 2006 .

[10]  Luc Steels,et al.  Aibo''s first words. the social learning of language and meaning. Evolution of Communication , 2002 .

[11]  M. Cole,et al.  Mind in society: The development of higher psychological processes. L. S. Vygotsky. , 1978 .

[12]  R. Brooks,et al.  The cog project: building a humanoid robot , 1999 .

[13]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[14]  Kerstin Dautenhahn,et al.  Getting to know each other - Artificial social intelligence for autonomous robots , 1995, Robotics Auton. Syst..

[15]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[16]  Jean-Christophe Baillie,et al.  Grounding Symbols in Perception with two Interacting Autonomous Robots , 2004 .

[17]  Giulio Sandini,et al.  Developmental robotics: a survey , 2003, Connect. Sci..