Talk2Data: High-Level Question Decomposition for Data-Oriented Question and Answering

Through a data-oriented question and answering system, users can directly “ask” the system for the answers to their analytical questions about the input tabular data. This process greatly improves user experience and lowers the technical barriers of data analysis. Existing techniques focus on providing a concrete query for users or untangling the ambiguities in a specific question so that the system could better understand questions and provide more correct and precise answers. However, when users have little knowledge about the data, it is difficult for them to ask concrete questions. Instead, high-level questions are frequently asked, which cannot be easily solved with the existing techniques. To address the issue, in this paper, we introduce Talk2Data, a data-oriented online question and answering system that supports answering both low-level and high-level questions. It leverages a novel deep-learning model to resolve high-level questions into a series of low-level questions that can be answered by data facts. These low-level questions could be used to gradually elaborate the users’ requirements. We design a set of annotated and captioned visualizations to represent the answers in a form that supports interpretation and narration. We evaluate the effectiveness of the Talk2Data system via a series of evaluations including case studies, performance validation, and a controlled user study. The results show the power of the system.

[1]  W. Bruce Croft,et al.  ANTIQUE: A Non-factoid Question Answering Benchmark , 2019, ECIR.

[2]  Michelle X. Zhou,et al.  An optimization-based approach to dynamic visual context management , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[3]  Seung-won Hwang,et al.  Adversarial TableQA: Attention Supervision for Question Answering on Tables , 2018, ACML.

[4]  Chang Zhou,et al.  Cognitive Graph for Multi-Hop Reading Comprehension at Scale , 2019, ACL.

[5]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[6]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[7]  Bongshin Lee,et al.  Interweaving Multimodal Interaction With Flexible Unit Visualizations for Data Exploration , 2020, IEEE Transactions on Visualization and Computer Graphics.

[8]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[9]  John Stasko,et al.  NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries , 2020, IEEE Transactions on Visualization and Computer Graphics.

[10]  Bowen Yu,et al.  FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System , 2019, IEEE Transactions on Visualization and Computer Graphics.

[11]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[12]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[13]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[14]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[15]  Eduard H. Hovy,et al.  Tables as Semi-structured Knowledge for Question Answering , 2016, ACL.

[16]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[17]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[18]  Tiejun Zhao,et al.  Constraint-Based Question Answering with Knowledge Graph , 2016, COLING.

[19]  James R. Eagan,et al.  Low-level components of analytic activity in information visualization , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[20]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[21]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[22]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[23]  Jillian Aurisano Articulate 2 : Toward a Conversational Interface for Visual Data Exploration , 2016 .

[24]  Hannaneh Hajishirzi,et al.  Multi-hop Reading Comprehension through Question Decomposition and Rescoring , 2019, ACL.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[27]  Vidya Setlur,et al.  Applying Pragmatics Principles for Interaction with Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[28]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[29]  Apoorv Saxena,et al.  Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings , 2020, ACL.

[30]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[31]  Graham Neubig,et al.  TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data , 2020, ACL.

[32]  Timothy Baldwin,et al.  CQADupStack: A Benchmark Data Set for Community Question-Answering Research , 2015, ADCS.

[33]  Seyed H. Roosta Parallel Search Algorithms , 2000 .

[34]  John T. Stasko,et al.  Natural Language Interfaces for Data Analysis with Visualization: Considering What Has and Could Be Asked , 2017, EuroVis.

[35]  Yun Wang,et al.  DataShot: Automatic Generation of Fact Sheets from Tabular Data , 2020, IEEE Transactions on Visualization and Computer Graphics.

[36]  Rebecca E. Grinter,et al.  A Multi-Modal Natural Language Interface to an Information Visualization Environment , 2001, Int. J. Speech Technol..

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[39]  Yiwen Sun,et al.  Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations , 2010, Smart Graphics.

[40]  Gerhard Weikum,et al.  TempQuestions: A Benchmark for Temporal Question Answering , 2018, WWW.

[41]  Heidrun Schumann,et al.  A Design Space of Visualization Tasks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[42]  Tiejun Zhao,et al.  Knowledge-Based Question Answering as Machine Translation , 2014, ACL.

[43]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[44]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[45]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[46]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[47]  Ran El-Yaniv,et al.  Multi-Hop Paragraph Retrieval for Open-Domain Question Answering , 2019, ACL.

[48]  Yaohui Jin,et al.  Copy or Rewrite: Hybrid Summarization with Hierarchical Reinforcement Learning , 2020, AAAI.

[49]  Tao Yu,et al.  TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation , 2018, NAACL.

[50]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[51]  Yang Shi,et al.  Calliope: Automatic Visual Data Story Generation from a Spreadsheet , 2020, IEEE Transactions on Visualization and Computer Graphics.

[52]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[53]  Daniel Deutch,et al.  Break It Down: A Question Understanding Benchmark , 2020, TACL.

[54]  Martín Abadi,et al.  Learning a Natural Language Interface with Neural Programmer , 2016, ICLR.

[55]  Yejin Choi,et al.  MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms , 2019, NAACL.

[56]  Claudia Niederée,et al.  A Neural Network-based Framework for Non-factoid Question Answering , 2018, WWW.

[57]  M. Sheelagh T. Carpendale,et al.  More Than Telling a Story: Transforming Data into Visually Shared Stories , 2015, IEEE Computer Graphics and Applications.

[58]  Zhengdong Lu,et al.  Neural Enquirer: Learning to Query Tables in Natural Language , 2016, IEEE Data Eng. Bull..

[59]  Mitesh M. Khapra,et al.  Towards a Better Metric for Evaluating Question Generation Systems , 2018, EMNLP.

[60]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[61]  Wenhu Chen,et al.  HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data , 2020, EMNLP.

[62]  Kyunghyun Cho,et al.  Unsupervised Question Decomposition for Question Answering , 2020, EMNLP.

[63]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[64]  P. Ow,et al.  Filtered beam search in scheduling , 1988 .