Storing semistructured data with STORED

Systems for managing and querying semistructured-data sources often store data in proprietary object repositories or in a tagged-text format. We describe a technique that can use relational database management systems to store and manage semistructured data. Our technique relies on a mapping between the semistructured data model and the relational data model, expressed in a query language called STORED. When a semistructured data instance is given, a STORED mapping can be generated automatically using data-mining techniques. We are interested in applying STORED to XML data, which is an instance of semistructured data. We show how a document-type-descriptor (DTD), when present, can be exploited to further improve performance.

[1]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Frank Wm. Tompa,et al.  Shortening the OED: experience with a grammar-defined database , 1992, TOIS.

[5]  SuciuDan,et al.  A query language and optimization techniques for unstructured data , 1996 .

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[8]  Dan Suciu,et al.  Semistructured Data and XML , 2001, FODO.

[9]  Serge Abiteboul,et al.  Extracting schema from semistructured data , 1998, SIGMOD '98.

[10]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[11]  Seymour Ginsburg,et al.  The mathematical theory of context free languages , 1966 .

[12]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[13]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[14]  Erich J. Neuhold,et al.  Structured document storage and refined declarative and navigational access mechanisms in HyperStorM , 1997, The VLDB Journal.

[15]  Timos K. Sellis,et al.  Data Warehouse Configuration , 1997, VLDB.

[16]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[17]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[18]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[19]  Jennifer Widom,et al.  Querying Semistructured Heterogeneous Information , 1995, J. Syst. Integr..

[20]  Dan Suciu,et al.  Catching the boat with Strudel: experiences with a Web-site management system , 1998, SIGMOD '98.

[21]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[22]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[23]  Catriel Beeri,et al.  Schemas for Integration and Translation of Structured and Semi-structured Data , 1999, ICDT.

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Klemens Böhm,et al.  Applying a flexible OODBMS-IRS-coupling to structured document handling , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[26]  Gaston H. Gonnet,et al.  Mind Your Grammar: a New Approach to Modelling Text , 1987, VLDB.

[27]  Marvin H. Solomon,et al.  The GMAP: a versatile tool for physical data independence , 1996, The VLDB Journal.

[28]  Ke Wang,et al.  Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[29]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[30]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[31]  T. Sellis Data Warehouse Connguration , 1997 .