Learning Path Queries on Graph Databases

We investigate the problem of learning graph queries by exploiting user examples. The input consists of a graph database in which the user has labeled a few nodes as positive or negative examples, depending on whether or not she would like the nodes as part of the query result. Our goal is to handle such examples to find a query whose output is what the user expects. This kind of scenario is pivotal in several application settings where unfamiliar users need to be assisted to specify their queries. In this paper, we focus on path queries defined by regular expressions, we identify fundamental difficulties of our problem setting, we formalize what it means to be learnable, and we prove that the class of queries under study enjoys this property. We additionally investigate an interactive scenario where we start with an empty set of examples and we identify the informative nodes i.e., those that contribute to the learning process. Then, we ask the user to label these nodes and iterate the learning process until she is satisfied with the learned query. Finally, we present an experimental study on both real and synthetic datasets devoted to gauging the effectiveness of our learning algorithm and the improvement of the interactive approach.

[1]  Ramez Elmasri,et al.  Towards a Query-by-Example System for Knowledge Graphs , 2014, GRADES.

[2]  Slawomir Staworko,et al.  Learning twig and path queries , 2012, ICDT '12.

[3]  Peter T. Wood,et al.  Query languages for graph databases , 2012, SGMD.

[4]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[5]  Sara Cohen,et al.  Certain and possible XPath answers , 2013, ICDT '13.

[6]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[7]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[8]  Adriane Chapman,et al.  Making database systems usable , 2007, SIGMOD '07.

[9]  Khalid Belhajjame,et al.  Annotating the Behavior of Scientific Modules Using Data Examples: A Practical Approach , 2014, EDBT.

[10]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[11]  Frank Neven,et al.  Learning deterministic regular expressions for the inference of schemas from XML data , 2008, WWW.

[12]  J. Oncina,et al.  INFERRING REGULAR LANGUAGES IN POLYNOMIAL UPDATED TIME , 1992 .

[13]  M. W. Shields An Introduction to Automata Theory , 1988 .

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[15]  Christian Heinlein,et al.  Workflow and process synchronization with interaction expressions and graphs , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Dexter Kozen,et al.  Lower bounds for natural proof systems , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[17]  Aurélien Lemay,et al.  A Paradigm for Learning Queries on Big Data , 2014, Data4U '14.

[18]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[19]  Ulf Leser,et al.  High-performance information extraction with AliBaba , 2009, EDBT '09.

[20]  Peter Rossmanith,et al.  The Emptiness Problem for Intersections of Regular Languages , 1992, MFCS.

[21]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[22]  Aurélien Lemay,et al.  Interactive Path Query Specification on Graph Databases , 2015, EDBT.

[23]  Joachim Niehren,et al.  Learning Sequential Tree-to-Word Transducers , 2014, LATA.

[24]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[25]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[26]  Pablo Barceló,et al.  Querying graph databases , 2013, PODS '13.

[27]  Abraham Silberschatz,et al.  Learning and verifying quantified boolean queries by example , 2013, PODS '13.

[28]  Walter Daelemans Colin de la Higuera: Grammatical inference: learning automata and grammars , 2011, Machine Translation.

[29]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[30]  Angela Bonifati,et al.  Interactive Inference of Join Queries , 2014, EDBT.

[31]  Moshé M. Zloof Query by example , 1899 .

[32]  H. V. Jagadish,et al.  Guided Interaction: Rethinking the Query-Result Paradigm , 2011, Proc. VLDB Endow..

[33]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[34]  Angela Bonifati,et al.  Interactive Join Query Inference with JIM , 2014, Proc. VLDB Endow..

[35]  Themis Palpanas,et al.  Exemplar Queries: Give me an Example of What You Need , 2014, Proc. VLDB Endow..

[36]  Oded Shmueli,et al.  SoQL: A Language for Querying and Creating Data in Social Networks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[37]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[38]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[39]  Slawomir Staworko,et al.  Learning Schemas for Unordered XML , 2013, DBPL.

[40]  Ulf Leser,et al.  Regular Path Queries on Large Graphs , 2012, SSDBM.

[41]  Wim Martens,et al.  The complexity of regular expressions and property paths in SPARQL , 2013, TODS.

[42]  Frank Neven,et al.  Definability problems for graph query languages , 2013, ICDT '13.

[43]  Josep-Lluís Larriba-Pey,et al.  The Linked Data Benchmark Council Project , 2013, Datenbank-Spektrum.