Algebra-Based Optimization of XML-Extended OLAP Queries

In today’s OLAP systems, integrating fast changing data physically into a cube is complex and time-consuming. Our solution, the “OLAP-XML Federation System,” makes it possible to reference the fast changing data in XML format in OLAP queries without physical integration. In this paper, we introduce the novel query optimization techniques specialized for the federation system including a query optimizer and plan transformation rules. We also show the experimental results which suggest that our approach, unlike the physical integration, is a practical solution for integrating fast changing data into OLAP systems. Current OLAP systems have a common problem in handling the situations where changes in data requirements are common and data changes frequently. Physical integration of new data into OLAP systems is a long and timeconsuming process. The increasing use of XML suggests that the required data will often be available in XML format. Therefore, a logical integration of OLAP and XML data is desirable. Our overall solution is to logically federate the OLAP and XML data sources, decorating the OLAP cube with virtual dimensions based on external XML data, and thereby allowing selections and aggregations to be performed over the decorated cube. In this paper, we extend previous work [10, 14] by presenting the novel query optimization techniques specialized for the logical federation system, including a functioning implementation of the query optimizer for the OLAP-XML query engine and a set of plan transformation rules based on the logical algebra of OLAPXML federations. We also show the experiments on the query engine implemented with the above techniques, with respect to federation performance, optimization effectiveness, and feasibility, suggesting that the logical OLAPXML federation system can be the practical solution to gaining flexible access to fast changing data in XML format from OLAP systems. There has been a great deal of previous work on data integration, for instance, on relational data [4, 5, 9], semi

[1]  Soon-Young Huh,et al.  Federated Process Framework in a Virtual Enterprise Using an Object-oriented Database and Extensible Markup Language , 2003, J. Database Manag..

[2]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[3]  Yu Li,et al.  Representing UML snowflake diagram from integrating XML data using XML schema , 2005, International Workshop on Data Engineering Issues in E-Commerce.

[4]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[5]  Torben Bach Pedersen,et al.  Multidimensional data modeling for complex data , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Huajun Chen,et al.  RDF/RDFS-based Relational Database Integration , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Torben Bach Pedersen,et al.  Cost Modeling and Estimation for OLAP-XML Federations , 2002, DaWaK.

[8]  Arie Shoshani,et al.  Extending OLAP querying to external object databases , 2000, CIKM '00.

[9]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[10]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[11]  Sunita Sarawagi,et al.  Integrating Unstructured Data into Relational Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Luca Cabibbo,et al.  Integrating Heterogeneous Multidimensional Databases , 2005, SSDBM.

[13]  Michael Stonebraker,et al.  Independent, Open Enterprise Data Integration , 1999, IEEE Data Eng. Bull..

[14]  Qiang Zhu,et al.  Global Query Processing and Optimization in the CORDS Multidatabase System , 1996 .

[15]  Roy Goldman,et al.  WSQ/DSQ: a practical approach for combined querying of databases and the Web , 2000, SIGMOD '00.

[16]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[17]  Torben Bach Pedersen,et al.  A relevance-extended multi-dimensional model for a data warehouse contextualized with documents , 2005, DOLAP '05.

[18]  Tok Wang Ling,et al.  A Semantic Approach to Query Rewriting for Integrated XML Data , 2005, ER.

[19]  Torben Bach Pedersen,et al.  Evaluating XML-extended OLAP queries based on a physical algebra , 2004, DOLAP '04.

[20]  Volker Markl,et al.  POP/FED: progressive query optimization for federated queries in DB2 , 2006, VLDB.

[21]  Erik Thomsen,et al.  OLAP Solutions - Building Multidimensional Information Systems , 1997 .

[22]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[23]  Laks V. S. Lakshmanan,et al.  nD-SQL: A Multi-Dimensional Language for Interoperability and OLAP , 1998, VLDB.

[24]  Arie Shoshani,et al.  OLAP++: Powerful and Easy-to-Use Federations of OLAP and Object Databases , 2000, VLDB.

[25]  Chin-Wan Chung,et al.  Exploiting Versions for On-line Data Warehouse Maintenance in MOLAP Servers , 2002, VLDB.

[26]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[27]  Michael Stonebraker,et al.  Open enterprise data integration , 1999 .

[28]  Torben Bach Pedersen,et al.  Integrating XML Data in the TARGITOLAP System , 2004, ICDE.

[29]  Yvan Bédard,et al.  Handling evolutions in multidimensional structures , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[30]  Jennifer Widom,et al.  Ozone: Integrating Structured and Semistructured Data , 1999, DBPL.

[31]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[32]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[33]  Johann Eder,et al.  Changes of Dimension Data in Temporal Data Warehouses , 2001, DaWaK.

[34]  Torben Bach Pedersen,et al.  Query optimization for OLAP-XML federations , 2002, DOLAP '02.

[35]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[36]  Laura M. Haas,et al.  The Garlic project , 1996, SIGMOD '96.

[37]  Alberto O. Mendelzon,et al.  Maintaining data cubes under dimension updates , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[38]  Donovan A. Schneider,et al.  Modeling and querying multidimensional data sources in Siebel Analytics: a federated relational system , 2005, SIGMOD '05.

[39]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[40]  Jeffrey F. Naughton,et al.  Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies , 1996, VLDB.

[41]  Torben Bach Pedersen,et al.  XML-extended OLAP querying , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.