Querying documents in object databases

structuring schemas that consist in grammars annotated with database programs. To query documents, we introduce an extension of OQL, the ODMG standard query language for object databases. Our extension (named OQL-doc) allows us to query documents without a precise knowledge of their structure using in particular generalized path expressions and pattern matching. This allows us to introduce in a declarative language (in the style of SQL or OQL), navigational and information retrieval styles of accessing data. Query processing in the context of documents and path expressions leads to challenging implementation issues. We extend an object algebra with new operators to deal with generalized path expressions. We then consider two essential complementary optimization techniques. We show that almost standard database optimization techniques can be used to answer queries without having to load the entire document into the database. We also consider the interaction of full-text indexes (e.g., inverted files) with standard database collection indexes (e.g., B-trees) that provide important speed-up.

[1]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[2]  Leslie Lamport,et al.  Latex : A Document Preparation System , 1985 .

[3]  Elisa Bertino,et al.  Query processing in a multimedia document system , 1988, TOIS.

[4]  Thomas Reps,et al.  The Synthesizer Generator: A System for Constructing Language-Based Editors , 1988 .

[5]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[6]  Alberto O. Mendelzon,et al.  Expressing structural hypertext queries in graphlog , 1989, Hypertext.

[7]  Stanley B. Zdonik,et al.  Object-Oriented Queries: Equivalence and Optimization , 1989, DOOD.

[8]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[9]  M. Tamer Özsu,et al.  Queries and query processing in object-oriented database systems , 1990, TOIS.

[10]  Catriel Beeri,et al.  A Logical Query Language for Hypertext Systems , 1992, ECHT.

[11]  Sophie Cluet,et al.  A general framework for the optimization of object-oriented queries , 1992, SIGMOD '92.

[12]  Michael Kifer,et al.  Querying object-oriented databases , 1992, SIGMOD '92.

[13]  François Bancilhon,et al.  Building an Object-Oriented Database System, The Story of O2 , 1992 .

[14]  Catriel Beeri,et al.  Algebraic Optimization of Object-Oriented Query Languages , 1990, Theor. Comput. Sci..

[15]  Serge Abiteboul,et al.  Querying and Updating the File , 1993, VLDB.

[16]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .

[17]  Serge Abiteboul,et al.  Virtual Schemas and Bases , 1994, EDBT.

[18]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[19]  Guido Moerkotte,et al.  Classification And Optimization of Nested Queries in Object Bases , 1994, BDA.

[20]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[21]  Tak W. Yan,et al.  Integrating a Structured-Text Retrieval System with an Object-Oriented Database System , 1994, VLDB.

[22]  Frank Wm. Tompa,et al.  Text / Relational Database Management Systems: Harmonizing SQL and SGML , 1994, ADB.

[23]  Alberto O. Mendelzon,et al.  Finding Regular Simple Paths in Graph Databases , 1989, SIAM J. Comput..

[24]  Jennifer Widom,et al.  Querying Semistructured Heterogeneous Information , 1995, J. Syst. Integr..

[25]  Dan Suciu,et al.  Programming Constructs for Unstructured Data , 1995, DBPL.

[26]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[27]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[28]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[29]  Guido Moerkotte,et al.  Evaluating queries with generalized path expressions , 1996, SIGMOD '96.

[30]  Roy Goldman,et al.  LORE: a Lightweight Object REpository for semistructured data , 1996, SIGMOD '96.

[31]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[32]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[33]  Jeffrey D. Ullman,et al.  MedMaker: a mediation system based on declarative specifications , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[34]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[35]  Paolo Merialdo,et al.  Structures in the Web , 1997, Sistemi Evoluti per Basi di Dati.

[36]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[37]  Paolo Atzeni,et al.  Cut and paste , 1997, PODS '97.