DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.

[1]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[3]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[4]  John E. Hopcroft,et al.  An n log n algorithm for minimizing states in a finite automaton , 1971 .

[5]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[6]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[7]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[8]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[9]  Michael Stonebraker,et al.  TIMBER: A Sophisticated Relation Browser (Invited Paper) , 1982, VLDB.

[10]  Narain H. Gehani,et al.  OdeView: the graphical interface to Ode , 1990, SIGMOD '90.

[11]  M. W. Shields An Introduction to Automata Theory , 1988 .

[12]  Guido Moerkotte,et al.  Access Support Relations: An Indexing Method for Object Bases , 1992, Inf. Syst..

[13]  Dan Suciu,et al.  Programming Constructs for Unstructured Data , 1995, DBPL.

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[15]  Amihai Motro,et al.  The Design of KIVIEW: An Object-Oriented Browser , 1988, Expert Database Conf..

[16]  Philip S. Yu,et al.  On Index Selection Schemes for Nested Object Hierarchies , 1994, VLDB.

[17]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .

[18]  M. ZloofM. Query-by-example , 1977 .

[19]  Moshé M. Zloof Query-by-Example: A Data Base Language , 1977, IBM Syst. J..

[20]  Laura M. Haas,et al.  PESTO : An Integrated Query/Browser for Object Databases , 1996, VLDB.

[21]  Elisa Bertino,et al.  Indexing Techniques for Queries on Nested Objects , 1989, IEEE Trans. Knowl. Data Eng..

[22]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.