A heuristics-based approach to query optimization in structured document databases

The number of documents published via the World Wide Web in the form of SGML/HTML has been rapidly growing for years. Efficient, declarative access mechanisms for this type of document-structured documents in general-are becoming of great importance. This paper reports our most recent advance in pursuit of the effective processing and optimization of structured document queries, which are important for large repositories of structured documents. Our methodology emphasizes applying exclusively deterministic transformations on query expressions to achieve the best possible optimization efficiency. A new approach is thus proposed that facilitates the exploitation of the DTD (document type definition) knowledge, structural properties and structure indices of structured documents for the purpose of fast query optimization.

[1]  Hiroyuki Kitagawa,et al.  A Data Modelling and Query Processing Scheme for Integration of Structured Document Repositories and Relational Databases , 1997, DASFAA.

[2]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[3]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[4]  Ricardo A. Baeza-Yates,et al.  Proximal nodes: a model to query document databases by content and structure , 1997, TOIS.

[5]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[6]  Tova Milo,et al.  Optimizing queries on files , 1994, SIGMOD '94.

[7]  Klemens Böhm,et al.  Rule-Based Generation of Logical Query Plans with Controlled Complexity , 1997, DOOD.

[8]  Erich J. Neuhold,et al.  Structured document storage and refined declarative and navigational access mechanisms in HyperStorM , 1997, The VLDB Journal.

[9]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[10]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[11]  Tak W. Yan,et al.  Integrating a Structured-Text Retrieval System with an Object-Oriented Database System , 1994, VLDB.

[12]  M. Tamer Özsu,et al.  An object-oriented SGML/HyTime compliant multimedia database management system , 1997, MULTIMEDIA '97.

[13]  Tuong Dao,et al.  An indexing model for structured documents to support queries on content, structure and attributes , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[14]  Hiroyuki Kitagawa,et al.  A data modeling approach to the seamless information exchange among structured documents and databases , 1997, ACM Symposium on Applied Computing.

[15]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[16]  Chad Carson,et al.  Optimizing queries over multimedia repositories , 1996, SIGMOD '96.

[17]  Klemens Böhm,et al.  Query optimization for structured documents based on knowledge on the document type definition , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.