Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research

Meaning has been called the "holy grail" of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive science hold that human semantic representation depends on sensori-motor experience; the abundant evidence that human meaning representation is grounded in the perception of physical reality leads to the conclusion that meaning must depend on a fusion of multiple (perceptual) modalities. Despite this, AI research in general, and its subdisciplines such as computational linguistics and computer vision in particular, have focused primarily on tasks that involve a single modality. Here, we propose virtual embodiment as an alternative, long-term strategy for AI research that is multi-modal in nature and that allows for the kind of scalability required to develop the field coherently and incrementally, in an ethically responsible fashion.

[1]  Stephen Clark,et al.  Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception , 2015, EMNLP.

[2]  Giorgio Metta,et al.  Grounding vision through experimental manipulation , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[3]  N. Kardashev Transmission of Information by Extraterrestrial Civilizations. , 1964 .

[4]  Jeffrey Mark Siskind,et al.  A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video , 2015, J. Artif. Intell. Res..

[5]  Emiel van Miltenburg,et al.  Sound-based distributional models , 2015, IWCS.

[6]  Stevan Harnad The Symbol Grounding Problem , 1999, ArXiv.

[7]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[8]  Gabriella Vigliocco,et al.  The Role of Sensory and Motor Information in Semantic Representation: A Review , 2008 .

[9]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[10]  John R. Searle,et al.  Minds, brains, and programs , 1980, Behavioral and Brain Sciences.

[11]  Angeliki Lazaridou,et al.  Towards Multi-Agent Communication-Based Language Learning , 2016, ArXiv.

[12]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[15]  Linda B. Smith,et al.  Object properties and knowledge in early lexical learning. , 1991, Child development.

[16]  Stephen Clark,et al.  Grounding Semantics in Olfactory Perception , 2015, ACL.

[17]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[18]  Linda B. Smith,et al.  Object perception and object naming in early development , 1998, Trends in Cognitive Sciences.

[19]  Z. Harris,et al.  Foundations of language , 1941 .

[20]  Michael P. Kaschak,et al.  Grounding language in action , 2002, Psychonomic bulletin & review.

[21]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[22]  Lawrence W. Barsalou,et al.  Perceptions of perceptual symbols , 1999, Behavioral and Brain Sciences.

[23]  P. Calvo,et al.  Handbook of cognitive science : an embodied approach , 2008 .

[24]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[25]  Max M. Louwerse,et al.  Symbol Interdependency in Symbolic and Embodied Cognition , 2011, Top. Cogn. Sci..

[26]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[27]  Jean Maillard,et al.  Black Holes and White Rabbits: Metaphor Identification with Visual Features , 2016, NAACL.

[28]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.

[29]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[30]  Silvia Coradeschi,et al.  A Short Review of Symbol Grounding in Robotic and Intelligent Systems , 2013, KI - Künstliche Intelligenz.

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[33]  L. Barsalou Grounded cognition. , 2008, Annual review of psychology.

[34]  Shimon Ullman,et al.  Do You See What I Mean? Visual Resolution of Linguistic Ambiguities , 2015, EMNLP.

[35]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[36]  Tomas Mikolov,et al.  A Roadmap Towards Machine Intelligence , 2015, CICLing.

[37]  Bernt Schiele,et al.  Grounding Action Descriptions in Videos , 2013, TACL.

[38]  S. Schneider Science fiction and philosophy : from time travel to superintelligence , 2016 .

[39]  Z. Harris,et al.  Foundations of Language , 1940 .