Interactive Inference of Join Queries

We investigate the problem of inferring join queries from user interactions. The user is presented with a set of candidate tuples and is asked to label them as positive or negative depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across two relations by keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints between the involved relations. This kind of scenario occurs in several application settings, such as data integration, reverse engineering of database queries, and constraint inference. In such scenarios, the database instances may be too big to be skimmed. We explore the search space by using a set of strategies that let us prune what we call \uninformative" tuples, and directly present to the user the informative ones i.e., those that allow to quickly nd the goal query that the user has in mind. In this paper, we focus on the inference of joins with equality predicates and we show that for such joins deciding whether a tuple is uninformative can be done in polynomial time. Next, we propose several strategies for presenting tuples to the user in a given order that lets minimize the number of interactions. We show the eciency and scalability of our approach through an experimental study on both benchmark and synthetic datasets. Finally, we prove that adding projection to our queries makes the problem intractable.

[1]  Abraham Silberschatz,et al.  Learning and verifying quantified boolean queries by example , 2013, PODS '13.

[2]  Georg Gottlob,et al.  Schema mapping discovery from data instances , 2010, JACM.

[3]  Sara Cohen,et al.  Certain and possible XPath answers , 2013, ICDT '13.

[4]  Phokion G. Kolaitis,et al.  Designing and refining schema mappings via data examples , 2011, SIGMOD '11.

[5]  Abraham Silberschatz,et al.  Playful Query Specification with DataPlay , 2012, Proc. VLDB Endow..

[6]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[7]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[8]  Meihui Zhang,et al.  Reverse engineering complex join queries , 2013, SIGMOD '13.

[9]  Partha Pratim Talukdar,et al.  Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration , 2013, Proc. VLDB Endow..

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[12]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[13]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[14]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[15]  Phokion G. Kolaitis,et al.  EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples , 2011, Proc. VLDB Endow..

[16]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[17]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[18]  Laks V. S. Lakshmanan,et al.  Discovering Conditional Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.