论文信息 - Embedding Individual Table Columns for Resilient SQL Chatbots

Embedding Individual Table Columns for Resilient SQL Chatbots

Most of the world's data is stored in relational databases. Accessing these requires specialized knowledge of the Structured Query Language (SQL), putting them out of the reach of many people. A recent research thread in Natural Language Processing (NLP) aims to alleviate this problem by automatically translating natural language questions into SQL queries. While the proposed solutions are a great start, they lack robustness and do not easily generalize: the methods require high quality descriptions of the database table columns, and the most widely used training dataset, WikiSQL, is heavily biased towards using those descriptions as part of the questions. In this work, we propose solutions to both problems: we entirely eliminate the need for column descriptions, by relying solely on their contents, and we augment the WikiSQL dataset by paraphrasing column names to reduce bias. We show that the accuracy of existing methods drops when trained on our augmented, column-agnostic dataset, and that our own method reaches state of the art accuracy, while relying on column contents only.

[1] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2] Dawn Xiaodong Song,et al. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[3] Peter Thanisch,et al. Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[4] Emina Torlak,et al. Optimizing synthesis with metasketches , 2016, POPL.

[5] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[6] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7] Matteo Pagliardini,et al. Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[8] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .

[9] Pedro A. Szekely,et al. TabVec: Table Vectors for Classification of Web Tables , 2018, ArXiv.

[10] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[11] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[13] Richard Socher,et al. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[14] Sanjit A. Seshia,et al. Combinatorial sketching for finite programs , 2006, ASPLOS XII.