Querying Graphs with Data

Graph databases have received much attention as of late due to numerous applications in which data is naturally viewed as a graph; these include social networks, RDF and the Semantic Web, biological databases, and many others. There are many proposals for query languages for graph databases that mainly fall into two categories. One views graphs as a particular kind of relational data and uses traditional relational mechanisms for querying. The other concentrates on querying the topology of the graph. These approaches, however, lack the ability to combine data and topology, which would allow queries asking how data changes along paths and patterns enveloping it. In this article, we present a comprehensive study of languages that enable such combination of data and topology querying. These languages come in two flavors. The first follows the standard approach of path queries, which specify how labels of edges change along a path, but now we extend them with ways of specifying how both labels and data change. From the complexity point of view, the right type of formalisms are subclasses of register automata. These, however, are not well suited for querying. To overcome this, we develop several types of extended regular expressions to specify paths with data and study their querying power and complexity. The second approach adopts the popular XML language XPath and extends it from XML documents to graphs. Depending on the exact set of allowed features, we have a family of languages, and our study shows that it includes efficient and highly expressive formalisms for querying both the structure of the data and the data itself.

[1]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[2]  Dan Suciu,et al.  Query containment for conjunctive queries with regular expressions , 1998, PODS.

[3]  Algebraic logic , 1985, Problem books in mathematics.

[4]  Alberto O. Mendelzon,et al.  Foundations of semantic web databases , 2004, PODS.

[5]  Reinhard Pichler,et al.  Efficient Evaluation and Approximation of Well-designed Pattern Trees , 2015, PODS.

[6]  Egor V. Kostylev,et al.  Containment of Data Graph Queries , 2014, ICDT.

[7]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[8]  Diego Figueira,et al.  Forward-XPath and extended register automata on data-trees , 2010, ICDT '10.

[9]  Ahmed Bouajjani,et al.  Automatic verification of recursive procedures with one integer parameter , 2003, Theor. Comput. Sci..

[10]  Stéphane Demri,et al.  On the freeze quantifier in constraint LTL: decidability and complexity , 2005, 12th International Symposium on Temporal Representation and Reasoning (TIME'05).

[11]  Wenfei Fan,et al.  Graph pattern matching revised for social network analysis , 2012, ICDT '12.

[12]  Pablo Barceló,et al.  Parameterized regular expressions and their languages , 2011, Theor. Comput. Sci..

[13]  Diego Figueira,et al.  Future-Looking Logics on Data Words and Trees , 2009, MFCS.

[14]  Ulf Leser,et al.  A query language for biological networks , 2005, ECCB/JBI.

[15]  Georg Gottlob,et al.  Datalog LITE: a deductive query language with linear time model checking , 2002, TOCL.

[16]  Magdalena Ortiz,et al.  Conjunctive Regular Path Queries in Lightweight Description Logics , 2013, IJCAI.

[17]  Jan Van den Bussche,et al.  The Impact of Transitive Closure on the Boolean Expressiveness of Navigational Query Languages on Graphs , 2012, FoIKS.

[18]  Balder ten Cate,et al.  The expressivity of XPath with transitive closure , 2006, PODS.

[19]  Stéphane Grumbach,et al.  Constraint Databases , 1999, JFPLC.

[20]  Jorge Pérez,et al.  Schema mappings and data exchange for graph databases , 2013, ICDT '13.

[21]  Pablo Barceló,et al.  Querying Regular Graph Patterns , 2014, JACM.

[22]  Frank Neven,et al.  Static analysis of xml transformation and schema languages , 2006 .

[23]  Ahmed Bouajjani,et al.  Automatic verification of recursive procedures with one integer parameter , 2003, Theor. Comput. Sci..

[24]  Slawomir Lasota,et al.  An Extension of Data Automata that Captures XPath , 2010, LICS.

[25]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[26]  Oscar H. Ibarra,et al.  On Stateless Automata and P Systems , 2008, Int. J. Found. Comput. Sci..

[27]  Moshe Y. Vardi On the Complexity of Bounded-Variable Queries. , 1995, PODS 1995.

[28]  Jan Van den Bussche,et al.  Relative expressive power of navigational querying on graphs , 2011, ICDT '11.

[29]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[30]  Mikolaj Bojanczyk Automata for Data Words and Data Trees , 2010, RTA.

[31]  Tony Tan,et al.  Regular Expressions for Languages over Infinite Alphabets , 2004, Fundam. Informaticae.

[32]  Inderpal Singh Mumick,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[33]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[34]  Sergio Greco,et al.  Querying Graph Databases , 2000, EDBT.

[35]  Leonid Libkin,et al.  Trial for RDF: adapting graph query languages for RDF data , 2013, PODS '13.

[36]  Carsten Lutz,et al.  PDL with intersection and converse: satisfiability and infinite-state model checking , 2009, The Journal of Symbolic Logic.

[37]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2006, TODS.

[38]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[39]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[40]  Neil Immerman,et al.  Reachability Logic: An Efficient Fragment of Transitive Closure Logic , 2000, Log. J. IGPL.

[41]  Jeffrey Shallit,et al.  A Lower Bound Technique for the Size of Nondeterministic Finite Automata , 1996, Inf. Process. Lett..

[42]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[43]  Maarten Marx,et al.  Conditional XPath , 2005, TODS.

[44]  Mikolaj Bojanczyk,et al.  XPath evaluation in linear time , 2008, PODS.

[45]  Stéphane Demri,et al.  LTL with the Freeze Quantifier and Register Automata , 2006, 21st Annual IEEE Symposium on Logic in Computer Science (LICS'06).

[46]  Pablo Barceló,et al.  Graph Logics with Rational Relations and the Generalized Intersection Problem , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.

[47]  Wim Martens,et al.  Querying graph databases with XPath , 2013, ICDT '13.

[48]  Diego Calvanese,et al.  View-Based Query Answering and Query Containment over Semistructured Data , 2001, DBPL.

[49]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[50]  Cong Yu,et al.  SocialScope: Enabling Information Discovery on Social Content Sites , 2009, CIDR.

[51]  Peter T. Wood Graph Database , 2009, Encyclopedia of Database Systems.

[52]  Maarten Marx,et al.  Navigational XPath: calculus and algebra , 2007, SGMD.

[53]  Diego Calvanese,et al.  An Automata-Theoretic Approach to Regular XPath , 2009, DBPL.

[54]  Diego Figueira,et al.  Reasoning on words and trees with data , 2010 .

[55]  Marcelo Arenas,et al.  nSPARQL: A Navigational Language for RDF , 2008, SEMWEB.

[56]  Jorge Pérez,et al.  Relative Expressiveness of Nested Regular Expressions , 2012, AMW.

[57]  Neil D. Jones,et al.  Space-Bounded Reducibility among Combinatorial Problems , 1975, J. Comput. Syst. Sci..

[58]  John E. Hopcroft,et al.  The Directed Subgraph Homeomorphism Problem , 1978, Theor. Comput. Sci..

[59]  Claudio Gutiérrez,et al.  Representing, Querying and Transforming Social Networks with RDF/SPARQL , 2009, ESWC.

[60]  Jerzy Tiuryn,et al.  Dynamic logic , 2001, SIGA.

[61]  A. Tarski,et al.  A Formalization Of Set Theory Without Variables , 1987 .

[62]  Peter T. Wood,et al.  Query languages for graph databases , 2012, SGMD.

[63]  Maarten Marx,et al.  XPath and Modal Logics of Finite DAG's , 2003, TABLEAUX.

[64]  Sameh Elnikety,et al.  Graph data management systems for new application domains , 2011, Proc. VLDB Endow..

[65]  Y. Gurevich,et al.  Remarks on Berger's paper on the domino problem , 1972 .

[66]  Thomas Schwentick,et al.  Finite state machines for strings over infinite alphabets , 2004, TOCL.

[67]  Leonid Libkin,et al.  Elements of Finite Model Theory , 2004, Texts in Theoretical Computer Science.

[68]  Egor V. Kostylev,et al.  XPath for DL Ontologies , 2015, AAAI.

[69]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[70]  Yuri Gurevich,et al.  The Classical Decision Problem , 1997, Perspectives in Mathematical Logic.

[71]  Michael Benedikt,et al.  XPath leashed , 2009, CSUR.

[72]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[73]  Bertram Ludäscher,et al.  On implementing provenance-aware regular path queries with relational query engines , 2013, EDBT '13.

[74]  Juan L. Reutter Containment of Nested Regular Expressions , 2013, ArXiv.

[75]  Luc Segoufin Automata and Logics for Words and Trees over an Infinite Alphabet , 2006, CSL.

[76]  Thomas Schwentick,et al.  XPath query containment , 2004, SGMD.

[77]  Charles W. Bachman,et al.  The programmer as navigator , 1973, CACM.

[78]  Cristina Sirangelo,et al.  XML with incomplete information , 2010, JACM.

[79]  Diego Figueira,et al.  Bottom-up automata on data trees and vertical XPath , 2011, STACS.

[80]  Martin Lange,et al.  Model checking propositional dynamic logic with all extras , 2006, J. Appl. Log..

[81]  Claire David,et al.  Containment of pattern-based queries over data trees , 2013, ICDT '13.

[82]  Mikołaj Bojańczyk Automata for Data Words and Data Trees: Invited Presentation at the First Symposium on Games, Automata, Logic, and Formal Verification , 2010 .

[83]  Diego Figueira,et al.  Satisfiability of downward XPath with data equality tests , 2009, PODS.

[84]  Leonid Libkin,et al.  Regular Expressions with Binding over Data Words for Querying Graph Databases , 2013, Developments in Language Theory.

[85]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[86]  Thomas Schwentick,et al.  Two-Variable Logic on Words with Data , 2006, 21st Annual IEEE Symposium on Logic in Computer Science (LICS'06).

[87]  Nissim Francez,et al.  Finite-Memory Automata , 1994, Theor. Comput. Sci..

[88]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[89]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[90]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[91]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[92]  Alberto O. Mendelzon,et al.  Expressing structural hypertext queries in graphlog , 1989, Hypertext.

[93]  Dan Suciu,et al.  Declarative specification of Web sites with Strudel , 2000, The VLDB Journal.

[94]  Erich Grädel,et al.  On Transitive Closure Logic , 1991, CSL.

[95]  Juan L. Reutter Graph patterns : structure, query answering and applications in schema mappings and formal language theory , 2014 .

[96]  Anthony Widjaja Lin,et al.  Expressive Languages for Path Queries over Graph-Structured Data , 2012, TODS.

[97]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[98]  Egor V. Kostylev,et al.  SPARQL with Property Paths , 2015, SEMWEB.

[99]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[100]  J. Van Leeuwen,et al.  Handbook of theoretical computer science - Part A: Algorithms and complexity; Part B: Formal models and semantics , 1990 .

[101]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, PODS '97.

[102]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[103]  Juan L. Reutter,et al.  Recursion in SPARQL , 2015, SEMWEB.

[104]  Akrivi Katifori,et al.  Profiling Attitudes for Personalized Information Provision , 2011, IEEE Data Eng. Bull..

[105]  Alin Deutsch,et al.  Optimization Properties for Classes of Conjunctive Regular Path Queries , 2001, DBPL.

[106]  Leonid Libkin Certain answers as objects and knowledge , 2016, Artif. Intell..

[107]  Patricia Bouyer,et al.  An Algebraic Characterization of Data and Timed Languages , 2001, CONCUR.

[108]  LibkinLeonid,et al.  Querying Graphs with Data , 2016 .

[109]  Steve Cassidy,et al.  Generalizing XPath for directed graphs , 2003, Extreme Markup Languages®.

[110]  Nicole Schweikardt,et al.  Expressiveness and Static Analysis of Extended Conjunctive Regular Path Queries , 2013, AMW.

[111]  Thomas Schwentick,et al.  Two-variable logic on data trees and XML reasoning , 2009, JACM.

[112]  Robert Goldblatt,et al.  Well-structured program equivalence is highly undecidable , 2011, TOCL.

[113]  Alberto O. Mendelzon,et al.  Finding Regular Simple Paths in Graph Databases , 1989, SIAM J. Comput..

[114]  Oded Shmueli,et al.  SoQL: A Language for Querying and Creating Data in Social Networks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[115]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[116]  Hiroshi Sakamoto,et al.  Intractability of decision problems for finite-memory automata , 2000, Theor. Comput. Sci..

[117]  Rance Cleaveland,et al.  A linear-time model-checking algorithm for the alternation-free modal mu-calculus , 1993, Formal Methods Syst. Des..

[118]  Diego Calvanese,et al.  Reasoning on regular path queries , 2003, SGMD.

[119]  Luc Segoufin,et al.  Static analysis of XML processing with data values , 2007, SGMD.

[120]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[121]  Jianzhong Li,et al.  Graph homomorphism revisited for graph matching , 2010, Proc. VLDB Endow..

[122]  Leonid Libkin,et al.  Regular path queries on graphs with data , 2012, ICDT '12.

[123]  Carlos A. Hurtado,et al.  Edinburgh Research Explorer Expressive Languages for Path Queries over Graph-Structured Data , 2012 .

[124]  Tony Tan,et al.  Tree Automata over Infinite Alphabets , 2008, Pillars of Computer Science.

[125]  Wim Martens,et al.  The complexity of evaluating path expressions in SPARQL , 2012, PODS '12.

[126]  Gottfried Vossen,et al.  An Extension of Path Expressions to Simplify Navigation in Object-Oriented Queries , 1993, DOOD.

[127]  Diego Calvanese,et al.  Containment of Conjunctive Regular Path Queries with Inverse , 2000, KR.

[128]  Leonid Libkin,et al.  Regular expressions for data words , 2012, J. Comput. Syst. Sci..

[129]  Jan Van den Bussche,et al.  A Graph-Oriented Object Database Model , 1994, IEEE Trans. Knowl. Data Eng..

[130]  Orna Grumberg,et al.  Variable Automata over Infinite Alphabets , 2010, LATA.

[131]  Pablo Barceló,et al.  Querying graph patterns , 2011, PODS.

[132]  Maarten de Rijke,et al.  A Modal Perspective on Path Constraints , 2003, J. Log. Comput..

[133]  Bertram Ludäscher,et al.  Techniques for efficiently querying scientific workflow provenance graphs , 2010, EDBT '10.

[134]  Rance Cleaveland,et al.  A linear-time model-checking algorithm for the alternation-free modal mu-calculus , 1993, Formal Methods Syst. Des..

[135]  Carl A. Gunter,et al.  In handbook of theoretical computer science , 1990 .

[136]  Frank Olken Graph Data Management for Molecular Biology , 2003, OMICS.

[137]  Sebastian Rudolph,et al.  Flag & check: data access with monadically defined queries , 2013, PODS '13.