Embodied Human Computer Interaction

In this paper, we argue that embodiment can play an important role in the design and modeling of systems developed for Human Computer Interaction. To this end, we describe a simulation platform for building Embodied Human Computer Interactions (EHCI). This system, VoxWorld, enables multimodal dialogue systems that communicate through language, gesture, action, facial expressions, and gaze tracking, in the context of task-oriented interactions. A multimodal simulation is an embodied 3D virtual realization of both the situational environment and the co-situated agents, as well as the most salient content denoted by communicative acts in a discourse. It is built on the modeling language VoxML (Pustejovsky and Krishnaswamy in VoxML: a visualization modeling language, proceedings of LREC, 2016), which encodes objects with rich semantic typing and action affordances, and actions themselves as multimodal programs, enabling contextually salient inferences and decisions in the environment. VoxWorld enables an embodied HCI by situating both human and artificial agents within the same virtual simulation environment, where they share perceptual and epistemic common ground. We discuss the formal and computational underpinnings of embodiment and common ground, how they interact and specify parameters of the interaction between humans and artificial agents, and demonstrate behaviors and types of interactions on different classes of artificial agents.

[1]  Staffan Larsson,et al.  Modelling Language, Action, and Perception in Type Theory with Records , 2012, CSLP.

[2]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[3]  Robin Cooper,et al.  Interfacing language, spatial perception and cognition in Type Theory with Records , 2017, J. Lang. Model..

[4]  Thies Pfeiffer,et al.  Pointing and reference reconsidered , 2015 .

[5]  James Pustejovsky,et al.  Deictic Adaptation in a Virtual Environment , 2018, Spatial Cognition.

[6]  William J. Clancey,et al.  Situated Action: A Neuropsychologiwl Interpretation Response to Vera and Simon , 2005 .

[7]  Jan A. Plaza,et al.  Logics of public communications , 2007, Synthese.

[8]  Nicholas Asher,et al.  Common Ground, Corrections, and Coordination , 2003 .

[9]  Robert Stalnaker,et al.  Common Ground , 2002 .

[10]  Noah D. Goodman,et al.  A rational speech-act model of projective content , 2016, CogSci.

[11]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[12]  M. Tomasello,et al.  Shared intentionality. , 2007, Developmental science.

[13]  Michael L. Anderson Embodied Cognition: A field guide , 2003, Artif. Intell..

[14]  Kerstin Fischer,et al.  How People Talk with Robots: Designing Dialog to Reduce User Uncertainty , 2011, AI Mag..

[15]  Vyvyan Evans Language and Time: List of Tables , 2013 .

[16]  P. Dekker Predicate logic with anaphora , 1994 .

[17]  Anthony G. Cohn,et al.  QSRlib: a software library for online acquisition of qualitative spatial relations from video , 2016 .

[18]  Stefan Kopp,et al.  Gesture in embodied communication and human-computer interaction : 8th International Gesture Workshop, GW 2009, Bielefeld, Germany, February 25-27, 2009 : revised selected papers , 2010 .

[19]  James Pustejovsky,et al.  VoxML: A Visualization Modeling Language , 2016, LREC.

[20]  Matthew Stone,et al.  Formal Semantics for Iconic Gesture , 2006 .

[21]  Alexander I. Rudnicky,et al.  Towards evaluating recovery strategies for situated grounding problems in human-robot dialogue , 2013, 2013 IEEE RO-MAN.

[22]  James Pustejovsky,et al.  Lexical Knowledge Representation and Natural Language Processing , 1993, Artif. Intell..

[23]  B. Bergen Louder Than Words: The New Science of How the Mind Makes Meaning , 2012 .

[24]  Jerome Feldman,et al.  Embodied language, best-fit analysis, and formal compositionality. , 2010, Physics of life reviews.

[25]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[26]  Ewan Klein,et al.  Type-driven translation , 1985 .

[27]  Chris Barker,et al.  Continuations and Natural Language , 2014, Oxford Studies in Theoretical Linguistics.

[28]  Ron Chrisley,et al.  Embodied artificial intelligence , 2003, Artif. Intell..

[29]  J. Pustejovsky,et al.  The Lexicon , 2019 .

[30]  Laurent Prevot,et al.  Grounding Information in Route Explanation Dialogues , 2009, Spatial Language and Dialogue.

[31]  Robin Cooper,et al.  Records and Record Types in Semantic Theory , 2005, J. Log. Comput..

[32]  Douglas Herrmann,et al.  A Taxonomy of Part-Whole Relations , 1987, Cogn. Sci..

[33]  James Pustejovsky,et al.  Multimodal Semantic Simulations of Linguistically Underspecified Motion Events , 2016, Spatial Cognition.

[34]  James Pustejovsky,et al.  The syntax of event structure , 1991, Cognition.

[35]  Stephen Clark,et al.  Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research , 2016, ArXiv.

[36]  Hadas Kress-Gazit,et al.  Robots That Use Language , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[37]  Norbert Reithinger,et al.  Conversation is multimodal: thus conversational user interfaces should be as well , 2019, CUI.

[38]  Kenny R. Coventry,et al.  Spatial prepositions and the functional geometric framework. Towards a classification of extra-geometric influences , 2005 .

[39]  J. Tenenbaum,et al.  Probabilistic models of cognition: exploring representations and inductive biases , 2010, Trends in Cognitive Sciences.

[40]  Robin Cooper,et al.  Adapting Type Theory with Records for Natural Language Semantics , 2017 .

[41]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[42]  Matthias Scheutz,et al.  Toward Humanlike Task-Based Dialogue Processing for Human Robot Interaction , 2011, AI Mag..

[43]  James Pustejovsky,et al.  VoxSim: A Visual Platform for Modeling Motion Language , 2016, COLING.

[44]  Tim Fernando,et al.  Situations in LTL as strings , 2009, Inf. Comput..

[45]  Robin Cooper,et al.  Type Theory with Records for Natural Language Semantics , 2015 .

[46]  Elizabeth Boyle,et al.  Mixed Reality Deictic Gesture for Multi-Modal Robot Communication , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[47]  Josef van Genabith,et al.  Discourse representation theory , 1988 .

[48]  Herbert A. Simon,et al.  Situated Action: A Symbolic Interpretation , 1993, Cogn. Sci..

[49]  Nikhil Krishnaswamy,et al.  Monte Carlo Simulation Generation Through Operationalization of Spatial Primitives , 2017 .

[50]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[51]  Christina Unger,et al.  Dynamic Semantics as Monadic Computation , 2011, JSAI-isAI Workshops.

[52]  Isaac Wang,et al.  EGGNOG: A Continuous, Multi-modal Data Set of Naturally Occurring Gestures with Ground Truth Labels , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[53]  Noah D. Goodman,et al.  Theory learning as stochastic search in the language of thought , 2012 .

[54]  Eva Hornecker,et al.  Theories of embodiment in HCI , 2013 .

[55]  James Pustejovsky,et al.  The Qualitative Spatial Dynamics of Motion in Language , 2011, Spatial Cogn. Comput..

[56]  K. J. Craik,et al.  The nature of explanation , 1944 .

[57]  James Pustejovsky,et al.  A Type Composition Logic for Generative Lexicon , 2013, Advances in Generative Lexicon Theory.

[58]  Bruce A. Draper,et al.  Cooperating with Avatars Through Gesture, Language and Action , 2018, IntelliSys.

[59]  Pierre Lison,et al.  Situated Dialogue Processing for Human-Robot Interaction , 2010, Cognitive Systems.

[60]  Mark Weiser,et al.  The computer for the 21st Century , 1991, IEEE Pervasive Computing.

[61]  Johan van Benthem,et al.  Logical Dynamics of Information and Interaction , 2014 .

[62]  Bruce A. Draper,et al.  Communicating and Acting: Understanding Gesture in Simulation Semantics , 2017, IWCS.

[63]  R. Gordon Folk Psychology as Simulation , 1986 .

[64]  P. Schlenker Gestural Cosuppositions within the Transparency Theory , 2019, Linguistic Inquiry.

[65]  Jonathan Ginzburg,et al.  Computational Models of Dialogue , 2010 .

[66]  Matthew Stone,et al.  A Formal Semantic Analysis of Gesture , 2009, J. Semant..

[67]  Eleni Efthimiou,et al.  Gesture in Embodied Communication and Human-Computer Interaction, 8th International Gesture Workshop, GW 2009, Bielefeld, Germany, February 25-27, 2009, Revised Selected Papers , 2010, Gesture Workshop.

[68]  Frédéric Landragin,et al.  Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems , 2006, Signal Process..

[69]  Hao Yan,et al.  Coordination and context-dependence in the generation of embodied conversation , 2000, INLG.

[70]  James Pustejovsky,et al.  Interpreting Motion - Grounded Representations for Spatial Language , 2012, Explorations in language and space.

[71]  R. Naumann,et al.  Aspects of changes: a dynamic event semantics , 2001, J. Semant..

[72]  James Pustejovsky,et al.  From actions to events , 2018, Benjamins Current Topics.

[73]  Mark L. Johnson The body in the mind: the bodily basis of meaning , 1987 .

[74]  Language and Time: A Cognitive Linguistics Approach , 2013 .

[75]  James Pustejovsky,et al.  User-Aware Shared Perception for Embodied Agents , 2019, 2019 IEEE International Conference on Humanized Computing and Communication (HCC).

[76]  Deb Roy,et al.  Semiotic schemas: A framework for grounding language in action and perception , 2005, Artif. Intell..

[77]  Wolfgang Wahlster,et al.  Dialogue Systems Go Multimodal: The SmartKom Experience , 2006, SmartKom.

[78]  Jan van Eijck,et al.  Computational Semantics with Functional Programming , 2010 .

[79]  Johan van Benthem Logic and the flow of information , 1995 .

[80]  Jeroen Groenendijk,et al.  Dynamic predicate logic , 1991 .

[81]  João Manuel R. S. Tavares,et al.  A new approach for merging edge line segments , 1995 .

[82]  Nicholas Asher,et al.  A Type Driven Theory of Predication with Complex Types , 2008, Fundam. Informaticae.

[83]  James Pustejovsky,et al.  Embodied Human-Computer Interactions through Situated Grounding , 2020, IVA.

[84]  Mary Ellen Foster Enhancing Human-Computer Interaction with Embodied Conversational Agents , 2007, HCI.

[85]  W. Hoek,et al.  Dynamic Epistemic Logic , 2007 .

[86]  Bernhard Beckert,et al.  Dynamic Logic , 2007, The KeY Approach.

[87]  G. Miller,et al.  Language and Perception , 1976 .

[88]  A. Kendon Gesture: Visible Action as Utterance , 2004 .

[89]  Nicholas Asher,et al.  SDRT and Continuation Semantics , 2010, JSAI-isAI Workshops.

[90]  Chris L. Baker,et al.  Rational quantitative attribution of beliefs, desires and percepts in human mentalizing , 2017, Nature Human Behaviour.

[91]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.