Labeling RDF Graphs for Linear Time and Space Querying

Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.

[1]  Tim Furche Implementation of web query languages reconsidered: beyond tree and single-language algebras at (almost) no cost , 2008 .

[2]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[3]  Hammad Qureshi Contributions , 1974, Livre Blanc de la Recherche en Mécanique.

[4]  Dan Olteanu,et al.  SPEX: Streamed and Progressive Evaluation of XPath , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[6]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[7]  Tim Furche,et al.  XcerptRDF: A Pattern-based Answer to the Versatile Web Challenge , 2008 .

[8]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Tim Furche,et al.  Evaluating Complex Queries Against XML Streams with Polynomial Combined Complexity , 2004, BNCOD.

[10]  Marcelo Arenas,et al.  nSPARQL: A Navigational Language for RDF , 2008, SEMWEB.

[11]  Wen-Lian Hsu,et al.  A Simple Test for the Consecutive Ones Property , 1992, J. Algorithms.

[12]  Axel Polleres,et al.  From SPARQL to rules (and back) , 2007, WWW '07.

[13]  Interval-Based Graph Representations for Efficient Web Querying , 2009 .

[14]  Klaus U. Schulz,et al.  Complete answer aggregates for treelike databases: a novel approach to combine querying and navigation , 2001, TOIS.

[15]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Salim Haddadi,et al.  Consecutive block minimization is 1.5-approximable , 2008, Inf. Process. Lett..

[17]  Paul F. Dietz Maintaining order in a linked list , 1982, STOC '82.

[18]  Lee Chien-Sing,et al.  Node Labeling Schemes in XML Query Optimization: A Survey and Trends , 2009 .

[19]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[20]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[21]  Klaus U. Schulz,et al.  The BIRD Numbering Scheme for XML and Tree Databases - Deciding and Reconstructing Tree Relations Using Efficient Arithmetic Operations , 2005, XSym.

[22]  Divesh Srivastava,et al.  Index Structures for Matching XML Twigs Using Relational Query Processors , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[23]  D. R. Fulkerson,et al.  Incidence matrices and interval graphs , 1965 .

[24]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[25]  Haim Kaplan,et al.  Four Strikes Against Physical Mapping of DNA , 1995, J. Comput. Biol..

[26]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[27]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[28]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[29]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[30]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[31]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[32]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[33]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[34]  Tim Furche,et al.  Towards Data-Integration on the Semantic Web: Querying RDF with Xcerpt , 2005 .

[35]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[36]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[37]  Tim Furche,et al.  An efficient single-pass query evaluator for XML data streams , 2004, SAC '04.

[38]  D. Kandiyoti 1 introduction. , 2005, Journal of the ICRU.

[39]  Georg Gottlob,et al.  The complexity of acyclic conjunctive queries , 2001, JACM.

[40]  Lawrence T. Kou,et al.  Polynomial Complete Consecutive Information Retrieval Problems , 1977, SIAM J. Comput..

[41]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[42]  Laurent Viennot,et al.  Lex-BFS and partition refinement, with applications to transitive orientation, interval graph recognition and consecutive ones testing , 2000, Theor. Comput. Sci..

[43]  Vassilis Christophides,et al.  On labeling schemes for the semantic web , 2003, WWW '03.

[44]  Wen-Lian Hsu PC-Trees vs. PQ-Trees , 2001, COCOON.

[45]  Kellogg S. Booth,et al.  Linear algorithms to recognize interval graphs and test for the consecutive ones property , 1975, STOC.