SQL Generation from Natural Language Using Supervised Learning and Recurrent Neural Networks

Databases store a vast amount of today’s data and information, and to access that data users are required to have command over SQL or equivalent interface language. Hence, using a system that can convert a natural language to equivalent SQL query would make the data more accessible. In this sense, building natural language interfaces to relational databases is an important and challenging problem in natural language processing (NLP) and a widely studied field, and found recently momentum again due to the introduction of large-scale Datasets. In this paper, we present our approach based on word embedding and Recurrent Neural Networks (RNN), precisely on Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) cells. We present also the DataSet used for training and testing our models, based on WikiSQL, and finally we show where we arrived in terms of accuracy.

[1]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[2]  David H. D. Warren,et al.  An Efficient Easily Adaptable System for Interpreting Natural Language Queries , 1982, CL.

[3]  Alessandro Moschitti,et al.  Translating Questions to SQL Queries with Generative Parsers Discriminatively Reranked , 2012, COLING.

[4]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[5]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, FMCAD 2013.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[8]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[9]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[10]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[11]  Li Xiu,et al.  Application of data mining techniques in customer relationship management: A literature review and classification , 2009, Expert Syst. Appl..

[12]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[13]  T. Beck,et al.  A New Database on the Structure and Development of the Financial Sector , 2000 .

[14]  H. V. Jagadish,et al.  Constructing a Generic Natural Language Interface for an XML Database , 2006, EDBT.

[15]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI.

[16]  NAVID YAGHMAZADEH,et al.  SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[17]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[18]  Xifeng Yan,et al.  DialSQL: Dialogue Based Structured Query Generation , 2018, ACL.

[19]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[20]  Robin C. Meili,et al.  Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. , 2005, Health affairs.

[21]  Yoav Artzi,et al.  Learning to Map Context-Dependent Sentences to Executable Formal Queries , 2018, NAACL.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[24]  Vasudeva Varma,et al.  Towards Enhanced Opinion Classification using NLP Techniques. , 2011 .

[25]  BodikRastislav,et al.  Combinatorial sketching for finite programs , 2006 .

[26]  Raymond J. Mooney,et al.  Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus , 2007, ACL.

[27]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[28]  Alvin Cheung,et al.  Synthesizing highly expressive SQL queries from input-output examples , 2017, PLDI.

[29]  Francisco Casacuberta,et al.  A connectionist approach to machine translation , 1997, EUROSPEECH.

[30]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[32]  Zijian Li,et al.  An Encoder-Decoder Framework Translating Natural Language to Database Queries , 2017, IJCAI.

[33]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[34]  Jonathan Berant,et al.  Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing , 2018, EMNLP.

[35]  Noah A. Smith,et al.  Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[36]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.