Four Lessons in Versatility or How Query Languages Adapt to the Web

Exposing not only human-centered information, but machineprocessable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C's GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3)We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a "Web of Data".

[1]  Wolfgang Faber,et al.  DLV - A System for Declarative Problem Solving , 2000, ArXiv.

[2]  Michael Benedikt,et al.  XPath leashed , 2009, CSUR.

[3]  François Bry,et al.  A gentle introduction to Xcerpt, a rule-based query and transformation language for XML , 2002, RuleML.

[4]  Gabriel M. Kuper,et al.  Structural Properties of XPath Fragments , 2003, ICDT.

[5]  Ben Adida hGRDDL: Bridging microformats and RDFa , 2008, J. Web Semant..

[6]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[7]  Maarten Marx,et al.  Conditional XPath, the first order complete XPath dialect , 2004, PODS.

[8]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[9]  Marcelo Arenas,et al.  nSPARQL: A Navigational Language for RDF , 2008, SEMWEB.

[10]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[11]  Howard A. Blair,et al.  The Complexity of Local Stratification , 1994, Fundam. Informaticae.

[12]  Peter F. Patel-Schneider,et al.  The Yin/Yang web: XML syntax and RDF semantics , 2002, WWW '02.

[13]  Tim Furche,et al.  RDF Querying: Language Constructs and Evaluation Methods Compared , 2006, Reasoning Web.

[14]  Maarten Marx,et al.  Conditional XPath , 2005, TODS.

[15]  Sebastian Maneth,et al.  Efficient Memory Representation of XML Documents , 2005, DBPL.

[16]  Leonard Muellner,et al.  DocBook: The Definitive Guide with CD-ROM , 1999 .

[17]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[18]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[19]  Axel Polleres,et al.  From SPARQL to rules (and back) , 2007, WWW '07.

[20]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[21]  Drummond Reed,et al.  OpenID 2.0: a platform for user-centric identity management , 2006, DIM '06.

[22]  Tim Furche,et al.  Taming Existence in RDF Querying , 2008, RR.

[23]  Craig A. Knoblock,et al.  Modeling Web Sources for Information Integration , 1998, AAAI/IAAI.

[24]  Fang Wei-Kleiner,et al.  Containment of Conjunctive Queries with Safe Negation , 2003, ICDT.

[25]  Tim Furche,et al.  Efficient evaluation of n-ary conjunctive queries over trees and graphs , 2006, WIDM '06.

[26]  Wolfgang Faber,et al.  Declarative problem-solving using the DLV system , 2000 .

[27]  Tim Furche,et al.  Evaluating Complex Queries Against XML Streams with Polynomial Combined Complexity , 2004, BNCOD.

[28]  Rohit Khare,et al.  Microformats: the next (small) thing on the semantic Web? , 2006, IEEE Internet Computing.

[29]  Tim Furche,et al.  Modular Web Queries — From Rules to Stores , 2007 .

[30]  David Scott Warren,et al.  The XSB Programming System , 1993, Workshop on Programming with Logic Databases , ILPS.

[31]  Jérôme Euzenat,et al.  Similarity-Based Ontology Alignment in OWL-Lite , 2004, ECAI.

[32]  Krys J. Kochut,et al.  SPARQLeR: Extended Sparql for Semantic Association Discovery , 2007, ESWC.

[33]  Clemens Ley RDFLog: It's like Datalog for RDF , 2008 .

[34]  Thomas Eiter,et al.  Exploiting conjunctive queries in description logic programs , 2009, Annals of Mathematics and Artificial Intelligence.

[35]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[36]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[37]  Robin Milner,et al.  An Algebraic Definition of Simulation Between Programs , 1971, IJCAI.

[38]  Vanessa C. Klaas,et al.  Who's Who in the World Wide Web: Approaches to Name Disambiguation , 2007 .

[39]  Ur Informatik,et al.  A Gentle Introduction into Xcerpt, a Rule-Based Query and Transformation Language for XML , 2002 .

[40]  E. Dijkstra On the Role of Scientific Thought , 1982 .

[41]  Kenneth A. Ross,et al.  The well-founded semantics for general logic programs , 1991, JACM.

[42]  Christos H. Papadimitriou,et al.  Why not negation by fixpoint? , 1988, PODS '88.

[43]  Luca Cabibbo,et al.  The Expressive Power of Stratified Logic Programs with Value Invention , 1998, Inf. Comput..

[44]  Jack Minker Foundations of deductive databases and logic programming , 1988 .

[45]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[46]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[47]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[48]  Christoph Koch,et al.  On the complexity of nonrecursive XQuery and functional query languages on complex values , 2006, TODS.

[49]  Tim Furche,et al.  Web and Semantic Web Query Languages: A Survey , 2005, Reasoning Web.

[50]  Steffen Staab,et al.  The Semantic Web - ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings , 2008, SEMWEB.

[51]  Sebastian Schaffert,et al.  Xcerpt: a rule-based query and transformation language for the web , 2004 .

[52]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[53]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[54]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[55]  Teodor C. Przymusinski On the Declarative Semantics of Deductive Databases and Logic Programs , 1988, Foundations of Deductive Databases and Logic Programming..

[56]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[57]  Dan Suciu,et al.  UnQL: a query language and algebra for semistructured data based on structural recursion , 2000, The VLDB Journal.

[58]  Jérôme Euzenat,et al.  An integrative proximity measure for ontology alignment , 2003 .

[59]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[60]  Steffen Staab,et al.  Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web , 2008, WWW.

[61]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[62]  Letizia Tanca,et al.  XML-GL: a graphical language for querying and restructuring WWW Data , 1999, WWW 1999.

[63]  Tim Furche,et al.  An efficient single-pass query evaluator for XML data streams , 2004, SAC '04.

[64]  Edsger W. Dijkstra,et al.  Selected Writings on Computing: A personal Perspective , 1982, Texts and Monographs in Computer Science.

[65]  Kenneth A. Ross,et al.  Modular stratification and magic sets for Datalog programs with negation , 1994, JACM.

[66]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[67]  Dan Suciu,et al.  Database and XML Technologies , 2004, Lecture Notes in Computer Science.

[68]  Tim Furche,et al.  Foundations of Rule-Based Query Answering , 2007, Reasoning Web.

[69]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[70]  Klaus U. Schulz,et al.  The BIRD Numbering Scheme for XML and Tree Databases - Deciding and Reconstructing Tree Relations Using Efficient Arithmetic Operations , 2005, XSym.

[71]  Divesh Srivastava,et al.  Index Structures for Matching XML Twigs Using Relational Query Processors , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[72]  D. R. Fulkerson,et al.  Incidence matrices and interval graphs , 1965 .

[73]  Torsten Grust,et al.  Accelerating XPath evaluation in any RDBMS , 2004, TODS.

[74]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[75]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[76]  Thomas Schwentick,et al.  XPath query containment , 2004, SGMD.

[77]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[78]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[79]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[80]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[81]  Tim Furche,et al.  Simulation Subsumption or Déjà vu on the Web , 2008, RR.

[82]  Tim Furche,et al.  Modular Web Queries - From Rules to Stores , 2007, OTM Workshops.

[83]  Tim Furche,et al.  Simulation Subsumption or Déjà vu on the Web , 2008 .

[84]  David R. Dowty,et al.  Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives , 1985 .

[85]  Volker Linnemann,et al.  Translating XPath Queries into SPARQL Queries , 2007, OTM Workshops.

[86]  Vassilis Christophides,et al.  RQL: A Functional Query Language for RDF , 2004 .

[87]  Tim Furche,et al.  Xcerpt and visXcerpt: Twin Query Languages for the Semantic Web , 2004 .

[88]  Ronald Fagin,et al.  Multivalued dependencies and a new normal form for relational databases , 1977, TODS.

[89]  Stefano Ceri,et al.  Comparative analysis of five XML query languages , 1999, SGMD.

[90]  Jan Van den Bussche,et al.  On the completeness of object-creating database transformation languages , 1997, JACM.

[91]  François Bry,et al.  Towards Aggregated Answers for Semistructured Data , 2001, ICDT.

[92]  H. Przymusinska,et al.  Weakly stratified logic programs , 1990 .

[93]  K. A. Ross,et al.  Tabled Evaluation with Delaying for General Logic Programs , 1996 .

[94]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[95]  Tim Furche,et al.  The XML stream query processor SPEX , 2005, 21st International Conference on Data Engineering (ICDE'05).

[96]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[97]  Dan Olteanu,et al.  SPEX: Streamed and Progressive Evaluation of XPath , 2007, IEEE Transactions on Knowledge and Data Engineering.

[98]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[99]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[100]  María Bárbara Álvarez Torres,et al.  On the Move to Meaningful Internet Systems 2004: OTM 2004 Workshops , 2004, Lecture Notes in Computer Science.

[101]  Klaus U. Schulz,et al.  Complete answer aggregates for treelike databases: a novel approach to combine querying and navigation , 2001, TOIS.

[102]  Cong Yu,et al.  XQuery 1.0 and XPath 2.0 Full-Text , 2009, Encyclopedia of Database Systems.

[103]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[104]  Hisao Tamaki,et al.  OLD Resolution with Tabulation , 1986, ICLP.

[105]  Jacobo Torán,et al.  Completeness results for graph isomorphism , 2003, J. Comput. Syst. Sci..

[106]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[107]  Georg Gottlob,et al.  The complexity of acyclic conjunctive queries , 2001, JACM.

[108]  Wen-Lian Hsu PC-Trees vs. PQ-Trees , 2001, COCOON.

[109]  Kellogg S. Booth,et al.  Linear algorithms to recognize interval graphs and test for the consecutive ones property , 1975, STOC.

[110]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[111]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[112]  Ehud Shapiro,et al.  Third International Conference on Logic Programming , 1986 .

[113]  Jeffrey M. Bradshaw,et al.  Applying KAoS Services to Ensure Policy Compliance for Semantic Web Services Workflow Composition and Enactment , 2004, SEMWEB.

[114]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[115]  Torsten Grust,et al.  Relational Algebra: Mother Tongue - XQuery: Fluent , 2004, TDM.

[116]  Axel Polleres,et al.  XSPARQL: Traveling between the XML and RDF Worlds - and Avoiding the XSLT Pilgrimage , 2008, ESWC.

[117]  Tantek Çelik,et al.  Microformats: a pragmatic path to the semantic web , 2006, WWW '06.

[118]  autoepistemic Zogic Logic programming and negation : a survey , 2001 .

[119]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[120]  Thomas Schwentick,et al.  Conjunctive query containment over trees , 2011, J. Comput. Syst. Sci..

[121]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[122]  Alessandro Campi,et al.  Design and implementation of a graphical interface to XQuery , 2003, SAC '03.

[123]  Xmldm,et al.  XML-Based Data Management and Multimedia Engineering — EDBT 2002 Workshops , 2002, Lecture Notes in Computer Science.

[124]  Masatoshi Yoshikawa,et al.  ILOG: Declarative Creation and Manipulation of Object Identifiers , 1990, VLDB.

[125]  Paul J. Walmsley,et al.  XML Schema Part 0: Primer Second Edition , 2004 .

[126]  KochChristoph On the complexity of nonrecursive XQuery and functional query languages on complex values , 2006 .

[127]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[128]  Gottfried Vossen,et al.  The World Wide Web and Databases , 2001, Lecture Notes in Computer Science.

[129]  Howard Williams,et al.  Key Technologies for Data Management , 2004, Lecture Notes in Computer Science.

[130]  Wen-Lian Hsu,et al.  A Simple Test for the Consecutive Ones Property , 1992, J. Algorithms.

[131]  Tim Furche,et al.  Towards Data-Integration on the Semantic Web: Querying RDF with Xcerpt , 2005 .

[132]  Martin Kay,et al.  Parsing in functional unification grammar , 1986 .

[133]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[134]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[135]  Tim Furche,et al.  Querying the Web Reconsidered: Design Principles for Versatile Web Query Languages , 2005, Int. J. Semantic Web Inf. Syst..

[136]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[137]  Martin Kay,et al.  Functional Unification Grammar: A Formalism for Machine Translation , 1984, ACL.

[138]  Volker Linnemann,et al.  Embedding SPARQL into XQuery/XSLT , 2008, SAC '08.

[139]  Andy Seaborne,et al.  SPARQL/Update: A language for updating RDF graphs , 2007 .

[140]  Laurent Viennot,et al.  Lex-BFS and partition refinement, with applications to transitive orientation, interval graph recognition and consecutive ones testing , 2000, Theor. Comput. Sci..

[141]  Norman J. Walsh,et al.  DocBook: The Definitive Guide , 1999 .

[142]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[143]  Tim Furche Implementation of web query languages reconsidered: beyond tree and single-language algebras at (almost) no cost , 2008 .

[144]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .