Object Embodiment in a Multimodal Simulation

In this paper, we introduce a multimodal environment and semantics for facilitating communication and interaction with a computational agent, as proxy to a robot. To this end, we have created an embodied 3D simulation enabling both the generation and interpretation of multiple modalities, including: language, gesture, and the visualization of objects moving and agents acting in their environment. Objects are encoded with rich semantic typing and action affordances, while actions themselves are encoded as multimodal expressions (programs), allowing for contextually salient inferences and decisions in the environment.

[1]  Jeffrey Mark Siskind,et al.  Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 1999, J. Artif. Intell. Res..

[2]  Anthony G. Cohn,et al.  A Spatial Logic based on Regions and Connection , 1992, KR.

[3]  Danica Kragic,et al.  Grasping familiar objects using shape context , 2009, 2009 International Conference on Advanced Robotics.

[4]  Stefanie Tellex,et al.  Object schemas for grounding language in a responsive robot , 2008, Connect. Sci..

[5]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[6]  James Pustejovsky,et al.  Multimodal Semantic Simulations of Linguistically Underspecified Motion Events , 2016, Spatial Cognition.

[7]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[8]  James Pustejovsky,et al.  Interpreting Motion - Grounded Representations for Spatial Language , 2012, Explorations in language and space.

[9]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[10]  Sheena Rogers,et al.  Reasons for Realism: Selected Essays of James J. Gibson ed. by Edward Reed, Rebecca Jones (review) , 2017 .

[11]  James Pustejovsky,et al.  VoxML: A Visualization Modeling Language , 2016, LREC.

[12]  James Pustejovsky,et al.  Generating Simulations of Motion Events from Verbal Descriptions , 2014, *SEMEVAL.

[13]  James Pustejovsky,et al.  On the Representation of Inferences and their Lexicalization , 2013 .

[14]  J. Pustejovsky Dynamic Event Structure and Habitat Theory , 2013 .

[15]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[16]  Richard Sproat,et al.  WordsEye: an automatic text-to-scene conversion system , 2001, SIGGRAPH.

[17]  James Pustejovsky,et al.  Where Things Happen: On the Semantics of Event Localization , 2013 .

[18]  Nancy Ide,et al.  An Open Linguistic Infrastructure for Annotated Corpora , 2013, The People's Web Meets NLP.

[19]  Angelo Cangelosi,et al.  Grounding language in action and perception: from cognitive agents to humanoid robots. , 2010, Physics of life reviews.

[20]  James Pustejovsky,et al.  The Qualitative Spatial Dynamics of Motion in Language , 2011, Spatial Cogn. Comput..

[21]  James Pustejovsky,et al.  ECAT: Event Capture Annotation Tool , 2016, ArXiv.

[22]  Will Goldstone Unity Game Development Essentials , 2009 .

[23]  Christopher Potts,et al.  Text to 3D Scene Generation with Rich Lexical Grounding , 2015, ACL.