Table2Analysis: Modeling and Recommendation of Common Analysis Patterns for Multi-Dimensional Data

Given a table of multi-dimensional data, what analyses would human create to extract information from it? From scientific exploration to business intelligence (BI), this is a key problem to solve towards automation of knowledge discovery and decision making. In this paper, we propose Table2Analysis to learn commonly conducted analysis patterns from large amount of (table, analysis) pairs, and recommend analyses for any given table even not seen before. Multi-dimensional data as input challenges existing model architectures and training techniques to fulfill the task. Based on deep Q-learning with heuristic search, Table2Analysis does table to sequence generation, with each sequence encoding an analysis. Table2Analysis has 0.78 recall at top-5 and 0.65 recall at top-1 in our evaluation against a large scale spreadsheet corpus on the PivotTable recommendation task.

[1]  Patrick Marcel,et al.  A survey of query recommendation techniques for data warehouse exploration , 2011, EDA.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Tao Yu,et al.  TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation , 2018, NAACL.

[4]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[5]  Yong Xu,et al.  QuickInsights: Quick and Automatic Discovery of Insights from Multi-Dimensional Data , 2019, SIGMOD Conference.

[6]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[7]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[8]  Zhifang Sui,et al.  Table-to-text Generation by Structure-aware Seq2seq Learning , 2017, AAAI.

[9]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[10]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[11]  Di He,et al.  Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[14]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[15]  Martial Hebert,et al.  Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[18]  Tova Milo,et al.  Next-Step Suggestions for Modern Interactive Data Analysis Platforms , 2018, KDD.

[19]  Michael Alexander,et al.  Pivot Table Data Crunching , 2001 .

[20]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[21]  Sumit Gulwani,et al.  Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples , 2018, ICLR.

[22]  Man Lung Yiu,et al.  Extracting Top-K Insights from Multi-dimensional Data , 2017, SIGMOD Conference.

[23]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[24]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.