Learning Join Queries from User Examples

We investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across an arbitrary number m of relations while keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints across the involved relations. Inferring the join predicate across multiple relations when the referential constraints are unknown may occur in several applications, such as data integration, reverse engineering of database queries, and schema inference. In such scenarios, the number of tuples involved in the join is typically large. We introduce a set of strategies that let us inspect the search space and aggressively prune what we call uninformative tuples, and we directly present to the user the informative ones—that is, those that allow the user to quickly find the goal query she has in mind. In this article, we focus on the inference of joins with equality predicates and also allow disjunctive join predicates and projection in the queries. We precisely characterize the frontier between tractability and intractability for the following problems of interest in these settings: consistency checking, learnability, and deciding the informativeness of a tuple. Next, we propose several strategies for presenting tuples to the user in a given order that allows minimization of the number of interactions. We show the efficiency of our approach through an experimental study on both benchmark and synthetic datasets.

[1]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[2]  Abraham Silberschatz,et al.  Learning and verifying quantified boolean queries by example , 2013, PODS '13.

[3]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[4]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[5]  Phokion G. Kolaitis,et al.  Designing and refining schema mappings via data examples , 2011, SIGMOD '11.

[6]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[7]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[8]  Georg Gottlob,et al.  Schema mapping discovery from data instances , 2010, JACM.

[9]  Sara Cohen,et al.  Certain and possible XPath answers , 2013, ICDT '13.

[10]  Joachim Niehren,et al.  A learning algorithm for top-down XML transformations , 2010, PODS.

[11]  编程语言 Query by Example , 2010, Encyclopedia of Database Systems.

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  Adriane Chapman,et al.  Making database systems usable , 2007, SIGMOD '07.

[14]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[15]  Jan Paredaens,et al.  On the Expressive Power of the Relational Algebra , 1978, Inf. Process. Lett..

[16]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[17]  Aurélien Lemay,et al.  Learning Path Queries on Graph Databases , 2015, EDBT.

[18]  Abraham Silberschatz,et al.  Playful Query Specification with DataPlay , 2012, Proc. VLDB Endow..

[19]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[20]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[21]  Frank Neven,et al.  Learning deterministic regular expressions for the inference of schemas from XML data , 2010, ACM Trans. Web.

[22]  Phokion G. Kolaitis,et al.  EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples , 2011, Proc. VLDB Endow..

[23]  Angela Bonifati,et al.  Interactive Inference of Join Queries , 2014, EDBT.

[24]  Laks V. S. Lakshmanan,et al.  Discovering Conditional Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  H. V. Jagadish,et al.  Guided Interaction: Rethinking the Query-Result Paradigm , 2011, Proc. VLDB Endow..

[26]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[27]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[28]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[29]  Dirk Van Gucht,et al.  On the expressive power of the extended relational algebra for the unnormalized relational model , 1987, PODS.

[30]  Slawomir Staworko,et al.  Learning twig and path queries , 2012, ICDT '12.

[31]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[32]  Meihui Zhang,et al.  Reverse engineering complex join queries , 2013, SIGMOD '13.

[33]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[34]  George H. L. Fletcher,et al.  On the Expressive Power of the Relational Algebra on Finite Sets of Relation Pairs , 2009, IEEE Transactions on Knowledge and Data Engineering.

[35]  François Bancilhon,et al.  On the Completeness of Query Languages for Relational Data Bases , 1978, MFCS.

[36]  Angela Bonifati,et al.  Interactive Join Query Inference with JIM , 2014, Proc. VLDB Endow..

[37]  Partha Pratim Talukdar,et al.  Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration , 2013, Proc. VLDB Endow..