PowerDB-XML: A Platform for Data-Centric and Document-Centric XML Processing

Relational database systems are well-suited as a platform for data-centric XML processing. Data-centric applications process regularly structured XML documents using precise predicates. However, these approaches come too short when XML applications also require document-centric processing, i.e., processing of less rigidly structured documents using vague predicates in the sense of information retrieval. The PowerDB-XML project at ETH Zurich aims to address this drawback and to cover both these types of XML applications on a single platform. In this paper, we investigate the requirements of document-centric XML processing and propose to refine state-of-the-art retrieval models for unstructured flat document such that they meet the flexibility of the XML format. To do so, we rely on so-called query-specific statistics computed dynamically at query runtime to reflect the query scope. Moreover, we show that document-centric XML processing is efficiently feasible using relational database systems for storage management and standard SQL. This allows us to combine document-centric processing with data-centric XML-to-database mappings. Our XML engine named PowerDB-XML therefore supports the full range of XML applications on the same integrated platform.

[1]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[2]  Klemens Böhm,et al.  Applying a flexible OODBMS-IRS-coupling to structured document handling , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[3]  Hans-Jörg Schek,et al.  Text Search Using Database Systems Revisited - Some Experiments , 1995, BNCOD.

[4]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[5]  Edward A. Fox,et al.  Practical enhanced Boolean retrieval: Experiences with the smart and sire systems , 1988, Inf. Process. Manag..

[6]  Dan Suciu,et al.  SilkRoute: trading between relations and XML , 2000, Comput. Networks.

[7]  Hans-Jörg Schek,et al.  PowerDB-IR: information retrieval on top of a database cluster , 2001, CIKM '01.

[8]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[10]  Michael Rys Bringing the Internet to your database: using SQL server 2000 and XML to build loosely-coupled systems , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[12]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[13]  Juliana Freire,et al.  LegoDB: Customizing Relational Storage for XML Documents , 2002, VLDB.

[14]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[15]  Michael Rys,et al.  Bringing the Internet to Your Database: Using SQLServer 2000 and XML to Build Loosely-Coupled Systems , 2001, BTW.

[16]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[17]  Edward A. Fox,et al.  Research Contributions , 2014 .

[18]  Ophir Frieder,et al.  Integrating structured data and text: a relational approach , 1997 .

[19]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[20]  Hans-Jörg Schek,et al.  Generating Vector Spaces On-the-fly for Flexible XML Retrieval , 2002 .

[21]  Ophir Frieder On the Integration of Structured Data and Text: A Review of the SIRE Architecture (invited talk) , 2000, DELOS.

[22]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[23]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[24]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[25]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[26]  Eugene J. Shekita,et al.  Querying XML Views of Relational Data , 2001, VLDB.

[27]  Michael J. Carey,et al.  XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents , 2000, VLDB.

[28]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.