Grounding Symbols in Perception with two Interacting Autonomous Robots

Grounding symbolic representations in perception is a key and difficult issue for artificial intelligence. The ”Talking Heads” experiment (Steels and Kaplan, 2002) explores an interesting coupling between grounding and social learning of language. In the first version of this experiment, two cameras were interacting in a simplified visual environment made of colored shapes on a white board and they developed a shared, grounded lexicon. We present here the beginning of a new experiment which is an extension of the original one with two autonomous robots instead of two cameras and a complex and unconstrained visual environment. We review the difficulties raised specifically by the embodiment of the agents and propose some directions to address these questions.

[1]  Luc Steels,et al.  The Origins of Syntax in Visually Grounded Robotic Agents , 1997, IJCAI.

[2]  Paul Vogt,et al.  Bootstrapping grounded symbols by minimal autonomous robots , 2000 .

[3]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Jeffrey Mark Siskind Grounding language in perception , 2004, Artificial Intelligence Review.

[6]  B. Scassellati Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot , 1999 .

[7]  Luc Steels,et al.  Bootstrapping grounded word semantics , 1999 .

[8]  Radu Horaud,et al.  Object pose from 2-D to 3-D point and line correspondences , 1995, International Journal of Computer Vision.

[9]  Luc Steels,et al.  Aibo''s first words. the social learning of language and meaning. Evolution of Communication , 2002 .

[10]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[11]  Lutz Priese,et al.  Fast and Robust Segmentation of Natural Color Scenes , 1998, ACCV.

[12]  T. Ziemke,et al.  Rethinking Grounding , 1997 .

[13]  Heinrich Müller,et al.  Interaction with a projection screen using a camera-tracked laser pointer , 1998, Proceedings 1998 MultiMedia Modeling. MMM'98 (Cat. No.98EX200).

[14]  Shaogang Gong,et al.  Multi-view face detection using support vector machines and eigenspace modelling , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[15]  Jonathan D. Baker,et al.  Multiresolution statistical object recognition , 1994 .

[16]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[17]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[18]  Luc Steels,et al.  Language games for autonomous robots , 2001 .