Crossmodal content binding in information-processing architectures

Operating in a physical context, an intelligent robot faces two fundamental problems. First, it needs to combine information from its different sensors to form a representation of the environment that is more complete than any representation a single sensor could provide. Second, it needs to combine high-level representations (such as those for planning and dialogue) with sensory information, to ensure that the interpretations of these symbolic representations are grounded in the situated context. Previous approaches to this problem have used techniques such as (low-level) information fusion, ontological reasoning, and (highlevel) concept learning. This paper presents a framework in which these, and related approaches, can be used to form a shared representation of the current state of the robot in relation to its environment and other agents. Preliminary results from an implemented system are presented to illustrate how the framework supports behaviours commonly required of an intelligent robot.

[1]  Norbert Pfleger,et al.  Modality Fusion , 2006, SmartKom.

[2]  John D. Kelleher,et al.  Mediating between Qualitative and Quantitative Representations for Task-Orientated Human-Robot Interaction , 2007, IJCAI.

[3]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[4]  Wolfram Burgard,et al.  An Integrated Robotic System for Spatial Understanding and Situated Interaction in Indoor Environments , 2007, AAAI.

[5]  Deb Roy,et al.  Grounded Situation Models for Robots: Where words and percepts meet , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  John D. Kelleher,et al.  Information Fusion for Visual Reference Resolution in Dynamic Situated Dialogue , 2006, PIT.

[7]  Aaron Sloman,et al.  Towards an Integrated Robot with Multiple Cognitive Functions , 2007, AAAI.

[8]  Deb Roy,et al.  Semiotic schemas: A framework for grounding language in action and perception , 2005, Artif. Intell..

[9]  Sharon Wood,et al.  Planning and decision-making in dynamic domains , 1994, Ellis Horwood series in artificial intelligence.

[10]  Matthias Scheutz,et al.  Incremental natural language processing for HRI , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Deb Roy,et al.  Towards situated speech understanding: visual context priming of language models , 2005, Comput. Speech Lang..

[12]  Luc Steels,et al.  Semiotic Dynamics for Embodied Agents , 2006, IEEE Intelligent Systems.

[13]  Nick Hawes,et al.  BALT & CAST: Middleware for Cognitive Robotics , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[14]  Aaron Sloman,et al.  Varieties of Affect and the CogAff Architecture Schema , 2001 .

[15]  Scott Thomas,et al.  Using vision, acoustics, and natural language for disambiguation , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[16]  Andrea L. Thomaz,et al.  Socially guided machine learning , 2006 .

[17]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[18]  D. Roy Grounding Words in Perception and Action: Insights from Computational Models , 2005 .

[19]  Nick Hawes,et al.  A System for Continuous Learning of Visual Concepts , 2007 .