Graph patterns : structure, query answering and applications in schema mappings and formal language theory

Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. Queries need to be posed against such data, but techniques for querying patterns are generally lacking, and even simple properties of graph patterns, such as the languages needed to specify them, are not well understood. In this dissertation we present several contributions in the study of graph patterns. We analyze how to query them and how to use them as queries. We also analyze some of their applications in two different contexts: schema mapping specification and data exchange for graph databases, and formal language theory. We first identify key features of patterns, such as node and label variables and edges specified by regular expressions, and define a classification of patterns based on them. Next we study how to answer standard graph queries over graph patterns, and give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lower-complexity restrictions that guarantee tractability. We then turn to the the study of schema mappings for graph databases. As for relational and XML databases, our mapping languages are based on patterns. They subsume all previously considered mapping languages for graph databases, and are capable of expressing many data exchange scenarios in the graph database context. We study the problems of materializing solutions and query answering for data exchange under these mappings, analyze their complexity, and identify relevant classes of mappings and queries for which these problems can be solved efficiently. We also introduce a new model of automata that is based on graph patterns, and define two modes of acceptance for them. We show that this model has applications not only in graph databases but in several other contexts. We study the basic properties of such automata, and the key computational tasks associated with them.

[1]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[2]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[3]  Jianwen Su,et al.  E-services: a look behind the curtain , 2003, PODS.

[4]  Dan Suciu,et al.  Query containment for conjunctive queries with regular expressions , 1998, PODS.

[5]  Diego Calvanese,et al.  Containment of Regular Path Queries under Description Logic Constraints , 2011, IJCAI.

[6]  Pablo Barceló,et al.  Parameterized regular expressions and their languages , 2011, Theor. Comput. Sci..

[7]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[8]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Jeffrey Shallit,et al.  A Lower Bound Technique for the Size of Nondeterministic Finite Automata , 1996, Inf. Process. Lett..

[10]  Gilles Pesant,et al.  A Regular Language Membership Constraint for Finite Sequences of Variables , 2004, CP.

[11]  Michael Kaminski,et al.  Finite-Memory Automata with Non-Deterministic Reassignment , 2010, Int. J. Found. Comput. Sci..

[12]  David S. Johnson,et al.  Testing containment of conjunctive queries under functional and inclusion dependencies , 1982, J. Comput. Syst. Sci..

[13]  Chen Ding,et al.  Predicting locality phases for dynamic memory optimization , 2007, J. Parallel Distributed Comput..

[14]  Pablo Barceló Logical foundations of relational data exchange , 2009, SGMD.

[15]  Diego Calvanese,et al.  Conjunctive query containment and answering under description logic constraints , 2008, TOCL.

[16]  Phokion G. Kolaitis,et al.  The complexity of data exchange , 2006, PODS '06.

[17]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[18]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[19]  Dexter Kozen,et al.  Lower bounds for natural proof systems , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[20]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[21]  Jorge Pérez,et al.  Relative Expressiveness of Nested Regular Expressions , 2012, AMW.

[22]  Gerhard Weikum,et al.  Database and information-retrieval methods for knowledge discovery , 2009, CACM.

[23]  Moshe Y. Vardi On the Complexity of Bounded-Variable Queries. , 1995, PODS 1995.

[24]  Diego Calvanese,et al.  Simplifying schema mappings , 2011, ICDT '11.

[25]  Filip Murlak,et al.  XML schema mappings , 2009, PODS.

[26]  Jorge E. Mezei,et al.  On Relations Defined by Generalized Finite Automata , 1965, IBM J. Res. Dev..

[27]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, Frontiers of Computer Science.

[28]  Diego Calvanese,et al.  Containment of Conjunctive Regular Path Queries with Inverse , 2000, KR.

[29]  Marcelo Arenas,et al.  nSPARQL: A navigational language for RDF , 2010, J. Web Semant..

[30]  Claire David,et al.  Containment of pattern-based queries over data trees , 2013, ICDT '13.

[31]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[32]  Philip A. Bernstein,et al.  Composition of mappings given by embedded dependencies , 2005, PODS '05.

[33]  Yanhong A. Liu,et al.  Parametric regular path queries , 2004, PLDI '04.

[34]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[35]  Jianzhong Li,et al.  Graph homomorphism revisited for graph matching , 2010, Proc. VLDB Endow..

[36]  Marcelo Arenas,et al.  Data exchange beyond complete data , 2011, PODS.

[37]  Moni Naor,et al.  Know thy neighbor's neighbor: the power of lookahead in randomized P2P networks , 2004, STOC '04.

[38]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[39]  Marc Gyssens,et al.  A graph-oriented object database model , 1990, IEEE Trans. Knowl. Data Eng..

[40]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[41]  Thomas Schwentick,et al.  Conjunctive query containment over trees , 2011, J. Comput. Syst. Sci..

[42]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[43]  Claudio Gutiérrez,et al.  Representing, Querying and Transforming Social Networks with RDF/SPARQL , 2009, ESWC.

[44]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2005, Theor. Comput. Sci..

[45]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[46]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[47]  James Cheney,et al.  A Graph Model of Data and Workflow Provenance , 2010, TaPP.

[48]  Peter T. Wood,et al.  Query languages for graph databases , 2012, SGMD.

[49]  Jorge Pérez,et al.  Schema mappings and data exchange for graph databases , 2013, ICDT '13.

[50]  Diego Calvanese,et al.  View-based query processing and constraint satisfaction , 2000, Proceedings Fifteenth Annual IEEE Symposium on Logic in Computer Science (Cat. No.99CB36332).

[51]  Ian Horrocks,et al.  Conjunctive Query Answering for Description Logics with Transitive Roles , 2006, Description Logics.

[52]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[53]  Alberto O. Mendelzon,et al.  Foundations of Semantic Web databases , 2011, J. Comput. Syst. Sci..

[54]  Yanhong A. Liu,et al.  Querying Complex Graphs , 2006, PADL.

[55]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[56]  Marcelo Arenas,et al.  XML data exchange: consistency and query answering , 2005, PODS '05.

[57]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[58]  Leonid Libkin,et al.  On the complexity of query answering over incomplete XML documents , 2012, ICDT '12.

[59]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[60]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[61]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[62]  Alin Deutsch,et al.  Optimization Properties for Classes of Conjunctive Regular Path Queries , 2001, DBPL.

[63]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[64]  Anca Muscholl,et al.  Computing epsilon-free NFA from regular expressions in O(n log2(n)) time , 2000, RAIRO Theor. Informatics Appl..

[65]  Mark W. Krentel The Complexity of Optimization Problems , 1988, J. Comput. Syst. Sci..

[66]  Werner Nutt,et al.  Querying Incomplete Information in Semistructured Data , 2002, J. Comput. Syst. Sci..

[67]  Jan Van den Bussche,et al.  A Graph-Oriented Object Database Model , 1994, IEEE Trans. Knowl. Data Eng..

[68]  Orna Grumberg,et al.  Variable Automata over Infinite Alphabets , 2010, LATA.

[69]  Pablo Barceló,et al.  Querying graph patterns , 2011, PODS.

[70]  Debora Donato,et al.  The Web as a graph: How far we are , 2007, TOIT.

[71]  Marcelo Arenas,et al.  Query Languages for Data Exchange: Beyond Unions of Conjunctive Queries , 2009, ICDT '09.

[72]  Marcelo Arenas,et al.  nSPARQL: A Navigational Language for RDF , 2008, SEMWEB.

[73]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[74]  Toshiyuki Amagasa,et al.  An Efficient Pathway Search Using an Indexing Scheme for RDF , 2003 .

[75]  Carlos A. Hurtado,et al.  Edinburgh Research Explorer Expressive Languages for Path Queries over Graph-Structured Data , 2012 .

[76]  Marcelo Arenas,et al.  Relational and XML Data Exchange , 2010, Relational and XML Data Exchange.

[77]  M. Kanehisa,et al.  A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. , 2000, Nucleic acids research.

[78]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[79]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[80]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[81]  Alberto O. Mendelzon,et al.  Foundations of semantic web databases , 2004, PODS.

[82]  Ulf Leser,et al.  A query language for biological networks , 2005, ECCB/JBI.

[83]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[84]  Thomas Schwentick,et al.  Conjunctive Query Containment over Trees , 2007, DBPL.

[85]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.

[86]  Diego Calvanese,et al.  Answering regular path queries using views , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[87]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[88]  Ronald Fagin Inverting schema mappings , 2007 .

[89]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[90]  Phokion G. Kolaitis,et al.  A Logical Approach to Constraint Satisfaction , 2008, Complexity of Constraints.

[91]  Oded Shmueli,et al.  SoQL: A Language for Querying and Creating Data in Social Networks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[92]  Renzo Angles,et al.  A Comparison of Current Graph Database Models , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[93]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[94]  Leonid Libkin,et al.  Elements of Finite Model Theory , 2004, Texts in Theoretical Computer Science.

[95]  Jacques Sakarovitch,et al.  Synchronized Rational Relations of Finite and Infinite Words , 1993, Theor. Comput. Sci..

[96]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS 2004.

[97]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[98]  Marcelo Arenas,et al.  Inverting Schema Mappings: Bridging the Gap between Theory and Practice , 2009, Proc. VLDB Endow..

[99]  Cristina Sirangelo,et al.  XML with incomplete information , 2010, JACM.

[100]  Anca Muscholl,et al.  Computing epsilon-Free NFA from Regular Expressions in O(n log²(n)) Time , 1998, MFCS.

[101]  Jacob A. Abraham,et al.  A Formal Framework for Verification of Embedded Custom Memories of the Motorola MPC7450 Microprocessor , 2005, Formal Methods Syst. Des..

[102]  Laks V. S. Lakshmanan,et al.  On Testing Satisfiability of Tree Pattern Queries , 2004, VLDB.