A Paradigm for Learning Queries on Big Data

Specifying a database query using a formal query language is typically a challenging task for non-expert users. In the context of big data, this problem becomes even harder as it requires the users to deal with database instances of big sizes and hence difficult to visualize. Such instances usually lack a schema to help the users specify their queries, or have an incomplete schema as they come from disparate data sources. In this paper, we propose a novel paradigm for interactive learning of queries on big data, without assuming any knowledge of the database schema. The paradigm can be applied to different database models and a class of queries adequate to the database model. In particular, in this paper we present two instantiations that validated the proposed paradigm for learning relational join queries and for learning path queries on graph databases. Finally, we discuss the challenges of employing the paradigm for further data models and for learning cross-model schema mappings.

[1]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[2]  Slawomir Staworko,et al.  Learning twig and path queries , 2012, ICDT '12.

[3]  Ahmed K. Elmagarmid,et al.  Guided data repair , 2011, Proc. VLDB Endow..

[4]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[5]  Moshé M. Zloof Query by example , 1899 .

[6]  Koby Crammer,et al.  Learning to create data-integrating queries , 2008, Proc. VLDB Endow..

[7]  Angela Bonifati,et al.  Interactive Join Query Inference with JIM , 2014, Proc. VLDB Endow..

[8]  Pablo Barceló Baeza Querying graph databases , 2013, PODS 2013.

[9]  Phokion G. Kolaitis,et al.  EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples , 2011, Proc. VLDB Endow..

[10]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[11]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[12]  Iovka Boneva,et al.  Simple Schemas for Unordered XML , 2013, WebDB.

[13]  Abraham Silberschatz,et al.  Learning and verifying quantified boolean queries by example , 2013, PODS '13.

[14]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[15]  Phokion G. Kolaitis,et al.  Designing and refining schema mappings via data examples , 2011, SIGMOD '11.

[16]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[17]  Angela Bonifati,et al.  Interactive Inference of Join Queries , 2014, EDBT.

[18]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[19]  Sara Cohen,et al.  Certain and possible XPath answers , 2013, ICDT '13.

[20]  Joachim Niehren,et al.  A learning algorithm for top-down XML transformations , 2010, PODS '10.

[21]  Joachim Niehren,et al.  Interactive learning of node selecting tree transducer , 2006, Machine Learning.

[22]  Peter T. Wood,et al.  Query languages for graph databases , 2012, SGMD.

[23]  Aurélien Lemay,et al.  Learning Path Queries on Graph Databases , 2015, EDBT.

[24]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[25]  Joachim Niehren,et al.  Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples , 2006, ICGI.

[26]  Radu Ciucanu,et al.  Learning queries for relational, semi-structured, and graph databases , 2013, SIGMOD'13 PhD Symposium.

[27]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[28]  Tova Milo,et al.  On the Complexity of Mining Itemsets from the Crowd Using Taxonomies , 2014, ICDT.

[29]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[30]  Slawomir Staworko,et al.  Learning Schemas for Unordered XML , 2013, DBPL.

[31]  Abraham Silberschatz,et al.  Playful Query Specification with DataPlay , 2012, Proc. VLDB Endow..

[32]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[33]  Frank Neven,et al.  Learning deterministic regular expressions for the inference of schemas from XML data , 2008, WWW.

[34]  Walter Daelemans Colin de la Higuera: Grammatical inference: learning automata and grammars , 2011, Machine Translation.

[35]  Joachim Niehren,et al.  Learning Sequential Tree-to-Word Transducers , 2014, LATA.

[36]  Pablo Barceló,et al.  Querying graph databases , 2013, PODS '13.

[37]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[38]  Meihui Zhang,et al.  Reverse engineering complex join queries , 2013, SIGMOD '13.

[39]  Partha Pratim Talukdar,et al.  Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration , 2013, Proc. VLDB Endow..

[40]  Divesh Srivastava,et al.  Big Data Integration , 2015, Synthesis Lectures on Data Management.

[41]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[42]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..