Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations

Natural language interfaces (NLIs) for data visualization are becoming increasingly popular both in academic research and in commercial software. Yet, there is a lack of empirical understanding of how people specify visualizations through natural language. We conducted an online study (N = 102), showing participants a series of visualizations and asking them to provide utterances they would pose to generate the displayed charts. From the responses, we curated a dataset of 893 utterances and characterized the utterances according to (1) their phrasing (e.g., commands, queries, questions) and (2) the information they contained (e.g., chart types, data aggregations). To help guide future research and development, we contribute this utterance dataset and discuss its applications toward the creation and benchmarking of NLIs for visualization.

[1]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[2]  John Stasko,et al.  NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries , 2020, IEEE Transactions on Visualization and Computer Graphics.

[3]  Vidya Setlur,et al.  Inferencing underspecified natural language utterances in visual analysis , 2019, IUI.

[4]  John T. Stasko,et al.  Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[5]  Bongshin Lee,et al.  Interweaving Multimodal Interaction With Flexible Unit Visualizations for Data Exploration , 2020, IEEE Transactions on Visualization and Computer Graphics.

[6]  Michael Stonebraker,et al.  Beagle : Automated Extraction and Interpretation of Visualizations from the Web , 2017 .

[7]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[8]  Yun Wang,et al.  Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements , 2019, IEEE Transactions on Visualization and Computer Graphics.

[9]  Maneesh Agrawala,et al.  Answering Questions about Charts and Generating Visual Explanations , 2020, CHI.

[10]  Bowen Yu,et al.  FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System , 2019, IEEE Transactions on Visualization and Computer Graphics.

[11]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[12]  Yiwen Sun,et al.  Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations , 2010, Smart Graphics.

[13]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[14]  Wei Chen,et al.  Quda: Natural Language Queries for Visual Data Analytics , 2020, ArXiv.

[15]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[16]  James R. Eagan,et al.  Low-level components of analytic activity in information visualization , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[17]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[18]  Xiaoru Yuan,et al.  Automatic Annotation Synchronizing with Textual Description for Visualization , 2020, CHI.

[19]  Mary Czerwinski,et al.  Understanding the verbal language and structure of end-user descriptions of data visualizations , 2012, CHI.

[20]  Vidya Setlur,et al.  Do What I Mean, Not What I Say! Design Considerations for Supporting Intent and Context in Analytical Conversation , 2019, 2019 IEEE Conference on Visual Analytics Science and Technology (VAST).

[21]  Luyao Chen,et al.  CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases , 2019, EMNLP.

[22]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Melanie Tory,et al.  How Information Visualization Novices Construct Visualizations , 2010, IEEE Trans. Vis. Comput. Graph..

[24]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[25]  Jeffrey Heer,et al.  Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco , 2018, IEEE Transactions on Visualization and Computer Graphics.

[26]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Bongshin Lee,et al.  InChorus: Designing Consistent Multimodal Interactions for Data Visualization on Tablet Devices , 2020, CHI.

[29]  Vidya Setlur,et al.  Applying Pragmatics Principles for Interaction with Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[30]  Kanit Wongsuphasawat,et al.  Towards a general-purpose query language for visualization recommendation , 2016, HILDA '16.