Multimodal Coreference Resolution for Exploratory Data Visualization Dialogue: Context-Based Annotation and Gesture Identification

The goals of our work are twofold: gain insight into how humans interact with complex data and visualizations thereof in order to make discoveries; and use our findings to develop a dialogue system for exploring data visualizations. Crucial to both goals is understanding and modeling of multimodal referential expressions, in particular those that include deictic gestures. In this paper, we discuss how context information affects the interpretation of requests and their attendant referring expressions in our data. To this end, we have annotated our multimodal dialogue corpus for context and both utterance and gesture information; we have analyzed whether a gesture co-occurs with a specific request or with the context surrounding the request; we have started addressing multimodal co-reference resolution by using Kinect to detect deictic gestures; and we have started identifying themes found in the annotated context, especially in what follows the request.

[1]  Jillian Aurisano Articulate 2 : Toward a Conversational Interface for Visual Data Exploration , 2016 .

[2]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[3]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[4]  D. Byron Understanding Referring Expressions in Situated Language Some Challenges for Real-World Agents Donna , 2003 .

[5]  Masahiro Takatsuka,et al.  Estimating virtual touchscreen for fingertip interaction with large displays , 2006, OZCHI '06.

[6]  Changsong Liu,et al.  Shared Gaze in Situated Referential Grounding: An Empirical Study , 2013, Eye Gaze in Intelligent User Interfaces.

[7]  Andrew McCallum,et al.  Transition-based Dependency Parsing with Selectional Branching , 2013, ACL.

[8]  Thomas Marrinan,et al.  SAGE2: A new approach for data intensive collaboration using Scalable Resolution Shared Displays , 2014, 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[9]  Jillian Aurisano “ Show Me Data . ” Observational Study of a Conversational Interface in Visual Data Exploration , 2015 .

[10]  James D. Hollan,et al.  An introduction to HITS: Human Interface Tool Suite , 1991 .

[11]  Desney S. Tan,et al.  The large-display user experience , 2005, IEEE Computer Graphics and Applications.

[12]  Yiwen Sun,et al.  Articulate: Creating Meaningful Visualizations from Natural Language , 2012 .

[13]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[14]  Alois Knoll,et al.  The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Joyce Yue Chai,et al.  Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces , 2008, IUI '08.

[16]  Melinda Sinclair,et al.  The effects of context on utterance interpretation:Some questions and some answers , 2012 .

[17]  Barbara Di Eugenio,et al.  The roles and recognition of Haptic-Ostensive actions in collaborative multimodal human-human dialogues , 2015, Comput. Speech Lang..

[18]  Eui Chul Lee,et al.  Multi-modal user interface combining eye tracking and hand gesture recognition , 2017, Journal on Multimodal User Interfaces.

[19]  Susan Goldin Hearing gesture : how our hands help us think , 2003 .

[20]  Tyler Baldwin,et al.  Communicative gestures in coreference identification in multiparty meetings , 2009, ICMI-MLMI '09.

[21]  Jacob Eisenstein,et al.  Gesture Improves Coreference Resolution , 2006, HLT-NAACL.

[22]  Abhinav Kumar,et al.  Towards a dialogue system that supports rich visualizations of data , 2016, SIGDIAL Conference.

[23]  Frédéric Landragin,et al.  Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems , 2006, Signal Process..

[24]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[25]  Andrew Kehler,et al.  Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction , 2000, AAAI/IAAI.

[26]  Guest Editorial Gesture and speech in interaction : An overview , 2013 .

[27]  Susann LuperFoy,et al.  The Representation of Multimodal User Interface Dialogues Using Discourse Pegs , 1992, ACL.

[28]  Luc Van Gool,et al.  Real-time pointing gesture recognition for an immersive environment , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[29]  Neville Ryant,et al.  A Large-scale Classication of English Verbs , 2006 .

[30]  Joyce Yue Chai,et al.  What's in a gaze?: the role of eye-gaze in reference resolution in multimodal conversational interfaces , 2008, IUI '08.

[31]  Pan Jing,et al.  Human-computer Interaction using Pointing Gesture based on an Adaptive Virtual Touch Screen , 2013 .

[32]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[33]  Ashwani Kumar,et al.  Miamm — A Multimodal Dialogue System Using Haptics , 2005 .

[34]  Rebecca E. Grinter,et al.  A Multi-Modal Natural Language Interface to an Information Visualization Environment , 2001, Int. J. Speech Technol..

[35]  Takenobu Tokunaga,et al.  Multi-modal Reference Resolution in Situated Dialogue by Integrating Linguistic and Extra-Linguistic Clues , 2011, IJCNLP.