Mental imagery for a conversational robot

To build robots that engage in fluid face-to-face spoken conversations with people, robots must have ways to connect what they say to what they see. A critical aspect of how language connects to vision is that language encodes points of view. The meaning of my left and your left differs due to an implied shift of visual perspective. The connection of language to vision also relies on object permanence. We can talk about things that are not in view. For a robot to participate in situated spoken dialog, it must have the capacity to imagine shifts of perspective, and it must maintain object permanence. We present a set of representations and procedures that enable a robotic manipulator to maintain a "mental model" of its physical environment by coupling active vision to physical simulation. Within this model, "imagined" views can be generated from arbitrary perspectives, providing the basis for situated language comprehension and production. An initial application of mental imagery for spatial language understanding for an interactive robot is described.

[1]  J. Lammens A computational model of color perception and color naming , 1995 .

[2]  E Bizzi,et al.  Motor learning through the combination of primitives. , 2000, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[3]  James M. Keller,et al.  Spatial relations for tactical robot navigation , 2001, SPIE Defense + Commercial Sensing.

[4]  Jerome A. Feldman,et al.  When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs , 1997 .

[5]  Chen Yu,et al.  The Role of Embodied Intention in Early Lexical Acquisition , 2005, Cogn. Sci..

[6]  Deb Roy,et al.  Coupling perception and simulation: steps towards conversational robotics , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[7]  R. Finke,et al.  Principles of mental imagery , 1989 .

[8]  L. Talmy Toward a Cognitive Semantics , 2003 .

[9]  Udo W. Pooch,et al.  Connecting simulation to the mission operational environment , 2000 .

[10]  N. Cocchiarella,et al.  Situations and Attitudes. , 1986 .

[11]  Helge J. Ritter,et al.  Multi-modal human-machine communication for instructing robot grasping tasks , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  F. Cao,et al.  MIMIC: a robot planning environment integrating real and simulated worlds , 1989, Proceedings. IEEE International Symposium on Intelligent Control 1989.

[13]  Laura A. Carlson,et al.  Grounding spatial language in perception: an empirical and computational investigation. , 2001, Journal of experimental psychology. General.

[14]  Deb Roy,et al.  A visual context-aware multimodal system for spoken language processing , 2003, INTERSPEECH.

[15]  Andreas Stolcke,et al.  L0-The first five years of an automated language acquisition project , 2004, Artificial Intelligence Review.

[16]  P. McKellar,et al.  Imagination and Thinking , 1957 .

[17]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[18]  Patrick Suppes,et al.  Language and Learning for Robots , 1994 .

[19]  Jay G. Wilpon,et al.  SAM: a perceptive spoken language-understanding robot , 1992, IEEE Trans. Syst. Man Cybern..

[20]  Jean Piaget,et al.  Perceptual and cognitive(or operational)structures in the development of the concept of space in the child , 1955 .

[21]  J. Piaget,et al.  The essential Piaget , 1979 .

[22]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[23]  P. Johnson-Laird,et al.  Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness , 1985 .

[24]  Zunaid Kazi,et al.  Grasping at Straws : An Intelligent Multimodal Assistive Robot , 1997 .

[25]  Jeffrey Mark Siskind,et al.  Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 1999, J. Artif. Intell. Res..

[26]  Deb Roy,et al.  Grounded spoken language acquisition: experiments in word learning , 2003, IEEE Trans. Multim..

[27]  Jeffrey Mark Siskind,et al.  Naive physics, event perception, lexical semantics, and language acquisition , 1992 .

[28]  G. Miller,et al.  Language and Perception , 1976 .

[29]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[30]  Deb Roy,et al.  A trainable spoken language understanding system for visual object selection , 2002, INTERSPEECH.

[31]  Deb Roy,et al.  Grounded Semantic Composition for Visual Scenes , 2011, J. Artif. Intell. Res..

[32]  L. Barsalou,et al.  Whither structured representation? , 1999, Behavioral and Brain Sciences.

[33]  Fredrik Rehnmark,et al.  Robonaut: A Robot Designed to Work with Humans in Space , 2003, Auton. Robots.

[34]  Barbara Tversky,et al.  Structures Of Mental Spaces , 2003 .

[35]  J. Feldman,et al.  Karma: knowledge-based active representations for metaphor and aspect , 1997 .