Intelligent Assistant for Exploring Data Visualizations

Visualization, while an effective tool for identifying patterns and insights, requires expert knowledge due to challenges faced when translating user queries to visual encodings. Research has shown that using a natural language interface (NLI) is effective for these challenges because the user can simply talk to a computer capable of producing the graphs directly. In this paper, we discuss our intelligent assistant which processes speech and hand pointing gestures while also dealing with any number of visualizations on a large screen display. Evaluation of the system shows that it is capable of quickly producing visualizations. It also particularly effective at responding to less ambiguous queries, while in certain cases can handle ambiguous or complex queries. Exploring large datasets with visualizations makes gathering insights easier due to the ability to quickly identify patterns and compare trends between visualizations. However, for users unfamiliar with visualization, (Grammel, Tory, and Storey 2010) points out the steep learning curve necessary to translate high-level queries into visual representations. While popular tools such as Tableau can help facilitate the process, learning new user interfaces itself presents challenges that can overwhelm the user. Speech recognition and natural language processing, whose advances have in turn led to commercial success of natural language interfaces (NLIs) (e.g., virtual assistants such as Apple Siri, Amazon Alexa, and so on), have been a key focus of the research community in alleviating these challenges. In particular, various interactive visualization systems have been proposed (Cox et al. 2001; Reithinger et al. 2005; Sun et al. 2010; Setlur et al. 2016; Gao et al. 2015; Hoque et al. 2017; Yu and Silva 2019), that implement NLIs for visualization to help process verbal and nonverbal communication, effectively decouples the user interface from the user. In this paper, we discuss our own intelligent assistant that we claim provides a more supportive environment for data exploration relative to other systems; the user is able to use a large screen display to explore data, can interact with the large screen using free-forming NL as well as pointing gestures, and has the ability to manage multiple visualizations Copyright c © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. on the screen at once. Our contributions in this paper are: 1) Our assistant is implemented as a dialogue system (such systems are capable of having conversations with humans and have been successful in various domains, for example airline travel (Hemphill, Godfrey, and Doddington 1990; Budzianowski et al. 2018)). It is modeled after our own collected multimodal dialogue data (Kumar et al. 2016; 2017), capturing user interaction with a large screen display while tasked with exploring city crime data for Chicago (called CHICAGO-CRIME-VIS corpus). As a result, rather than making assumptions and enforcing NL templates, our approach is flexible to the different ways that user queries are spoken. Note that only 15% of our data contains such queries, which we refer to as actionable requests (AR) directly, while the remaining 85% of the spoken utterances do not specify actions to be taken by the system. 2) Our system is configured to operate on a large screen display, effectively allowing the user to preserve a sizable number of past visualizations on the screen. The user is able to quickly refer to any of these visualizations when forming subsequent ARs or analyzing recently constructed ones. 3) The system manages a dialogue history (DH) of past visualizations, which can be accessed in the future to retrieve visualizations that correspond to references made by the user. For example ”Show me the last heat map.” finds the most recent entry associated with map plots. 4) We conducted an extensive evaluation of the system with 20 subjects in a user study. The evaluation showed that the system is fast at responding to ARs and is also effective in generating satisfactory (according to user feedback) visualizations to ARs that have little ambiguity. One of the primary limitations observed by participants is the tendency by the system to take the meaning of an AR too literally, which can lead to a wrong understanding of the underlying intentions of the user and consequently a misleading visualization.

[1]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[2]  Melanie Tory,et al.  How Information Visualization Novices Construct Visualizations , 2010, IEEE Trans. Vis. Comput. Graph..

[3]  Vidya Setlur,et al.  Applying Pragmatics Principles for Interaction with Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[4]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[5]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[6]  Neville Ryant,et al.  A large-scale classification of English verbs , 2008, Lang. Resour. Evaluation.

[7]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[8]  Bowen Yu,et al.  FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System , 2019, IEEE Transactions on Visualization and Computer Graphics.

[9]  Ashwani Kumar,et al.  Miamm — A Multimodal Dialogue System Using Haptics , 2005 .

[10]  Abhinav Kumar,et al.  Towards a dialogue system that supports rich visualizations of data , 2016, SIGDIAL Conference.

[11]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[12]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[13]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[14]  Barbara Di Eugenio,et al.  Multimodal Coreference Resolution for Exploratory Data Visualization Dialogue: Context-Based Annotation and Gesture Identification , 2017 .

[15]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[16]  Yiwen Sun,et al.  Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations , 2010, Smart Graphics.

[17]  Martha Palmer,et al.  Optimization of natural language processing components for robustness and scalability , 2012 .

[18]  Rafael E. Banchs,et al.  The Fourth Dialog State Tracking Challenge , 2016, IWSDS.

[19]  Rebecca E. Grinter,et al.  A Multi-Modal Natural Language Interface to an Information Visualization Environment , 2001, Int. J. Speech Technol..