论文信息 - Quda: Natural Language Queries for Visual Data Analytics

Quda: Natural Language Queries for Visual Data Analytics

Visualization-oriented natural language interfaces (V-NLIs) have been explored and developed in recent years. One challenge faced by V-NLIs is in the formation of effective design decisions that usually requires a deep understanding of user queries. Learning-based approaches have shown potential in V-NLIs and reached state-of-the-art performance in various NLP tasks. However, because of the lack of sufficient training samples that cater to visual data analytics, cutting-edge techniques have rarely been employed to facilitate the development of V-NLIs. We present a new dataset, called Quda, to help V-NLIs understand free-form natural language. Our dataset contains 14;035 diverse user queries annotated with 10 low-level analytic tasks that assist in the deployment of state-of-the-art techniques for parsing complex human language. We achieve this goal by first gathering seed queries with data analysts who are target users of V-NLIs. Then we employ extensive crowd force for paraphrase generation and validation. We demonstrate the usefulness of Quda in building V-NLIs by creating a prototype that makes effective design decisions for free-form user queries. We also show that Quda can be beneficial for a wide range of applications in the visualization community by analyzing the design tasks described in academic publications.

[1] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2] Fei Li,et al. Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[3] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[4] Ankush Gupta,et al. A Deep Generative Framework for Paraphrase Generation , 2017, AAAI.

[5] Benno Stein,et al. An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[6] James R. Eagan,et al. Low-level components of analytic activity in information visualization , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[7] Jeffrey Heer,et al. Reverse‐Engineering Visualizations: Recovering Visual Encodings from Chart Images , 2017, Comput. Graph. Forum.

[8] Alex Endert,et al. Task-Based Effectiveness of Basic Visualizations , 2017, IEEE Transactions on Visualization and Computer Graphics.

[9] John T. Stasko,et al. Natural Language Interfaces for Data Analysis with Visualization: Considering What Has and Could Be Asked , 2017, EuroVis.

[10] Alex Endert,et al. Broadening Intellectual Diversity in Visualization Research Papers , 2019, IEEE Computer Graphics and Applications.

[11] Emiel Krahmer,et al. Paraphrase Generation as Monolingual Translation: Data and Evaluation , 2010, INLG.

[12] John Stasko,et al. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries , 2020, IEEE transactions on visualization and computer graphics.

[13] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[14] Olfa Nasraoui,et al. Mining search engine query logs for query recommendation , 2006, WWW '06.

[15] Walter S. Lasecki,et al. Conversations in the Crowd: Collecting Data for Task-Oriented Dialog Learning , 2013, Proceedings of the AAAI Conference on Human Computation and Crowdsourcing.

[16] Silvia Miksch,et al. Task Cube: A three-dimensional conceptual space of user tasks in visualization design and evaluation , 2016, Inf. Vis..

[17] NAVID YAGHMAZADEH,et al. SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[18] Jeffrey Heer,et al. ReVision: automated classification, analysis and redesign of chart images , 2011, UIST.

[19] Yun Wang,et al. Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements , 2019, IEEE Transactions on Visualization and Computer Graphics.

[20] Vidya Setlur,et al. Applying Pragmatics Principles for Interaction with Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[21] Wei Xu,et al. CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[23] Ricardo A. Baeza-Yates,et al. Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[24] Partha Talukdar,et al. Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation , 2019, NAACL.

[25] Michael S. Bernstein,et al. Iris: A Conversational Agent for Complex Tasks , 2017, CHI.

[26] Dragomir R. Radev,et al. Improving Text-to-SQL Evaluation Methodology , 2018, ACL.

[27] Kyomin Jung,et al. Contextual-CNN: A Novel Architecture Capturing Unified Meaning for Sentence Classification , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[28] Luca Becchetti,et al. An optimization framework for query recommendation , 2010, WSDM '10.

[29] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[30] Tamara Munzner,et al. Visualization Analysis and Design , 2014, A.K. Peters visualization series.

[31] Richard Socher,et al. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[32] Andrew M. Dai,et al. Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[33] Rebecca E. Grinter,et al. A Multi-Modal Natural Language Interface to an Information Visualization Environment , 2001, Int. J. Speech Technol..

[34] Sebastián Ventura,et al. MLDA: A tool for analyzing multi-label datasets , 2017, Knowl. Based Syst..

[35] Sandeep Kumar,et al. Learning Semantic Sentence Embeddings using Sequential Pair-wise Discriminator , 2018, COLING.

[36] Michael Stonebraker,et al. Beagle : Automated Extraction and Interpretation of Visualizations from the Web , 2017 .

[37] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[38] Vidya Setlur,et al. Do What I Mean, Not What I Say! Design Considerations for Supporting Intent and Context in Analytical Conversation , 2019, 2019 IEEE Conference on Visual Analytics Science and Technology (VAST).

[39] Rahul Gupta,et al. A task in a suit and a tie: paraphrase generation with semantic augmentation , 2018, AAAI.

[40] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[41] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.

[42] Arvind Satyanarayan,et al. Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[43] Hang Li,et al. Paraphrase Generation with Deep Reinforcement Learning , 2017, EMNLP.

[44] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[45] Benno Stein,et al. Paraphrase acquisition via crowdsourcing and machine learning , 2013, TIST.

[46] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[47] Mukund Sundararajan,et al. Analyza: Exploring Data with Conversation , 2017, IUI.

[48] Alvin Cheung,et al. Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[49] Michelle A. Borkin,et al. What Makes a Visualization Memorable? , 2013, IEEE Transactions on Visualization and Computer Graphics.

[50] Vidya Setlur,et al. Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[51] Ellen M. Voorhees,et al. The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[52] Hua He,et al. A Continuously Growing Dataset of Sentential Paraphrases , 2017, EMNLP.

[53] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[54] Domagoj Vuljak,et al. Microsoft Power BI , 2017 .

[55] Abhinav Kumar,et al. Towards a dialogue system that supports rich visualizations of data , 2016, SIGDIAL Conference.

[56] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[57] Zhiyuan Liu,et al. A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[58] Tao Yu,et al. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[59] Karrie Karahalios,et al. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[60] Bowen Yu,et al. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System , 2019, IEEE Transactions on Visualization and Computer Graphics.

[61] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[62] John T. Stasko,et al. Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[63] Nan Hua,et al. Universal Sentence Encoder , 2018, ArXiv.

[64] Marti Hearst,et al. Toward Interface Defaults for Vague Modifiers in Natural Language Interfaces for Visual Analysis , 2019, 2019 IEEE Visualization Conference (VIS).

[65] Yiwen Sun,et al. Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations , 2010, Smart Graphics.

[66] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[67] Jevin D. West,et al. Viziometrics: Analyzing Visual Information in the Scientific Literature , 2016, IEEE Transactions on Big Data.