Towards Enhancing Database Education: Natural Language Generation Meets Query Execution Plans

The database systems course is offered as part of an undergraduate computer science degree program in many major universities. A key learning goal of learners taking such a course is to understand how sql queries are processed in a rdbms in practice. Since aquery execution plan (qep ) describes the execution steps of a query, learners can acquire the understanding by perusing the qep s generated by a rdbms. Unfortunately, in practice, it is often daunting for a learner to comprehend these qep s containing vendor-specific implementation details, hindering her learning process. In this paper, we present a novel, end-to-end,generic system called lantern that generates a natural language description of a qep to facilitate understanding of the query execution steps. It takes as input an sql query and its qep, and generates a natural language description of the execution strategy deployed by the underlying rdbms. Specifically, it deploys adeclarative framework called pool that enablessubject matter experts to efficiently create and maintain natural language descriptions of physical operators used in qep s. Arule-based framework called rule-lantern is proposed that exploits pool to generate natural language descriptions of qep s. Despite the high accuracy of rule-lantern, our engagement with learners reveal that, consistent with existing psychology theories, perusing such rule-based descriptions lead toboredom due to repetitive statements across different qep s. To address this issue, we present a noveldeep learning-based language generation framework called neural -lantern that infuses language variability in the generated description by exploiting a set ofparaphrasing tools andword embedding. Our experimental study with real learners shows the effectiveness of lantern in facilitating comprehension of qep s.

[1]  Zijian Li,et al.  NADAQ: Natural Language Database Querying Based on Deep Learning , 2019, IEEE Access.

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Carsten Binnig,et al.  An End-to-end Neural Natural Language Interface for Databases , 2018, ArXiv.

[5]  Shafiq R. Joty,et al.  NEURON: Query Optimization Meets Natural Language Processing For Augmenting Database Education , 2018, 1805.05670.

[6]  H. V. Jagadish,et al.  Duoquest: A Dual-Specification System for Expressive SQL Queries , 2020, SIGMOD Conference.

[7]  Georgia Koutrika,et al.  Logos: a system for translating queries into narratives , 2012, SIGMOD Conference.

[8]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[9]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Kyunghyun Cho,et al.  Generating Diverse Translations with Sentence Codes , 2019, ACL.

[13]  Silvia Knobloch-Westerwick,et al.  Severity, Efficacy, and Evidence Type as Determinants of Health Message Exposure , 2013, Health communication.

[14]  Guillaume Hervet,et al.  Is Banner Blindness Genuine? Eye Tracking Internet Text Advertising , 2011 .

[15]  Hyeonji Kim,et al.  Natural language to SQL: Where are we today? , 2020, Proc. VLDB Endow..

[16]  J. Cacioppo,et al.  Effects of message repetition and position on cognitive response, recall, and persuasion. , 1979 .

[17]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[18]  Zhengdong Lu,et al.  Neural Enquirer: Learning to Query Tables in Natural Language , 2016, IEEE Data Eng. Bull..

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Umar Farooq Minhas,et al.  ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores , 2016, Proc. VLDB Endow..

[21]  A. B. Hill,et al.  Towards a model of boredom. , 1985, British journal of psychology.

[22]  S. Chatman,et al.  Story and Discourse: Narrative Structure in Fiction and Film , 1979 .

[23]  Andreas Kipf,et al.  Learned Cardinalities: Estimating Correlated Joins with Deep Learning , 2018, CIDR.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  A. Harrison,et al.  Heterogeneity-homogeneity of exposure sequence and the attitudinal effects of exposure. , 1972, Journal of personality and social psychology.

[26]  Stephen J. Vodanovich,et al.  The essence of boredom. , 1993 .

[27]  Luyao Chen,et al.  CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases , 2019, EMNLP.

[28]  Carsten Binnig,et al.  DBPal: A Learned NL-Interface for Databases , 2018, SIGMOD Conference.

[29]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[30]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[31]  Henry A. Kautz,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[32]  S. Vodanovich,et al.  Boredom proneness and psychosocial development. , 1999, The Journal of psychology.

[33]  Abraham Bernstein,et al.  A comparative survey of recent natural language interfaces for databases , 2019, The VLDB Journal.

[34]  Lei Zou,et al.  Natural Language Question/Answering: Let Users Talk With The Knowledge Graph , 2017, CIKM.

[35]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[36]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[38]  H. V. Jagadish,et al.  DaNaLIX: a domain-adaptive natural language interface for querying XML , 2007, SIGMOD '07.

[39]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Adam Coates,et al.  Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.

[42]  H. V. Jagadish,et al.  Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[43]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[44]  S. Chatman Story and Discourse: Narrative Structure in Fiction and Film , 1980 .

[45]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[46]  J. O'hanlon,et al.  Boredom: practical consequences and a theory. , 1981, Acta psychologica.

[47]  J. Eastwood,et al.  The Measurement of Boredom , 2013, Assessment.

[48]  Ankita Makker,et al.  Natural language to SQL , 2019 .

[49]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[50]  H. V. Jagadish,et al.  NaLIR: an interactive natural language interface for querying relational databases , 2014, SIGMOD Conference.

[51]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[52]  Prasetya Utama,et al.  Bootstrapping an End-to-End Natural Language Interface for Databases , 2019, SIGMOD Conference.

[53]  David W. Schumann,et al.  Predicting the Effectiveness of Different Strategies of Advertising Variation: A Test of the Repetition-Variation Hypotheses , 1990 .

[54]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[55]  Panos Vassiliadis,et al.  Towards a Conceptual Model for Data Narratives , 2020, ER.

[56]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.