On the Difference between Navigating Semi-structured Data and Querying It

Currently, there is tremendous interest in semi-structured (SS)data management. This is spurred by data sources, such as the ACeDB [29], that are inherently less rigidly structured than traditional DBMS, by WWW documents where no hard rules or constraints are imposed and “anything goes,” and by integration of information coming from disparate sources exhibiting considerable differences in the way they structure information. Significant strides have been made in the development of data models and query languages [2, 11, 17, 6, 7], and to some extent, the theory of queries on semi-structured data [1, 23, 3, 13, 9]. The OEM model of the Stanford TSIMMIS project [2] (equivalently, its variant, independently developed at U.Penn. [11]) has emerged as the de facto standard model for semi-structured data. OEM is a light-weight object model,which unlike the ODMG model that it extends, does not impose the latter’s rigid type constraints. Both OEM and the Penn model essentially correspond to labeled digraphs. Amain theme emerging from the popular query languages such as Lorel [2], UnQL [11], StruQL [17], WebOQL [6], and the Ulixes/Penelope pair of the ADM model [7], is that navigation is considered an integral and essential part of querying. Indeed, given the lac of rigid schema of semi-structured data, navigation brings many benefits, including the ability to retrieve data regardless of the depth at which it resides in a tree (e.g.,see [4]). This is achieved with programming primitives such as regular path expressions and wildcards. A second, somewhat subtle, aspect of the emerging trend is that query expressions are often dependent on the particular instance they are applied to. This is not surprising, given the lac of rigid structure and the absence of the notion of a predefined schema for semi-structured data. In fact, it has been argued [4] that it is unreasonable to impose a predefined schema.

[1]  François Bancilhon,et al.  On the Completeness of Query Languages for Relational Data Bases , 1978, MFCS.

[2]  Jan Van den Bussche,et al.  Type inference in the polymorphic relational algebra , 1999, PODS '99.

[3]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[4]  Jeffrey D. Ullman,et al.  Integrating information by outerjoins and full disjunctions (extended abstract) , 1996, PODS.

[5]  Dan Suciu,et al.  Query containment for conjunctive queries with regular expressions , 1998, PODS.

[6]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[7]  Marc Andries,et al.  On Instance-Completeness for Database Query Languages involving Object Creation , 1996, J. Comput. Syst. Sci..

[8]  Jan Van den Bussche,et al.  On the completeness of object-creating database transformation languages , 1997, JACM.

[9]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[10]  Anand Rajaraman,et al.  Integrating Information by Outerjoins and Full Disjunctions , 1996, PODS 1996.

[11]  Serge Abiteboul,et al.  Extracting schema from semistructured data , 1998, SIGMOD '98.

[12]  Jan Paredaens,et al.  On the Expressive Power of the Relational Algebra , 1978, Inf. Process. Lett..

[13]  Vetlozar N Estorov,et al.  Extracting Schema from Semistructured Data S , 1998 .

[14]  John McCarthy,et al.  Mathematical Theory of Computation , 1991 .

[15]  Serge Abiteboul,et al.  Queries and computation on the web , 1997, Theor. Comput. Sci..

[16]  Jeffrey D. Ullman,et al.  Database theory—past and future , 1987, PODS.

[17]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[18]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[19]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, J. Comput. Syst. Sci..

[20]  Catriel Beeri,et al.  Schemas for Integration and Translation of Structured and Semi-structured Data , 1999, ICDT.

[21]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[22]  Witold Lipski,et al.  Nonapplicable Nulls , 1986, Theor. Comput. Sci..

[23]  Peter Buneman,et al.  Polymorphism and type inference in database programming , 1996, TODS.

[24]  David Harel,et al.  Computable Queries for Relational Data Bases , 1980, J. Comput. Syst. Sci..

[25]  Laks V. S. Lakshmanan,et al.  Tables as a paradigm for querying and restructuring (extended abstract) , 1996, PODS '96.

[26]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[27]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases, and webs , 1999 .

[28]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[29]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[30]  Alberto O. Mendelzon,et al.  Formal models of Web queries , 1997, Inf. Syst..

[31]  Paolo Merialdo,et al.  To Weave the Web , 1997, VLDB.