Applying Grid Technologies to XML Based OLAP Cube Construction

On-Line Analytical Processing (OLAP) is a powerful method for analysing large data warehouse data. Typically, the data for an OLAP database is collected from a set of data repositories such as e.g. operational databases. This data set is often huge, and it may not be known in advance what data is required and when to perform the desired data analysis tasks. Sometimes it may happen that some parts of the data are only needed occasionally. Therefore, storing all data to the OLAP database and keeping this database constantly up-to-date is not only a highly demanding task but it also may be overkill in practice. This suggests that in some applications it would be more feasible to form the OLAP cubes only when they are actually needed. However, the OLAP cube construction can be a slow process. Thus, we present a system that applies Grid technologies to distribute the computation. As the data sources may well be heterogeneous, we propose an XML language for data collection. The user’s definition for a OLAP new cube often includes selecting and aggregating the data. In our system this computation is distributed to the computers that store the original data. This reduces the network traffic and speeds up the computation that is now performed in parallel. The sub results are sent back to the ’collecting server’. Usually, the results do not arrive simultaneously. However, the collecting server starts to process a sub result immediately after it has arrived. Therefore, there is no need to wait that all sub result are received. We have implemented a prototype for the system. The implementation applies Spitfire software and Mobile Analyzer technology. They both are Grid based products applying Grid Security Infrastructure.

[1]  Bernhard Thalheim,et al.  OLAP databases and aggregation functions , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[2]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[3]  Boris Vrdoljak,et al.  Data warehouse design from XML sources , 2001, DOLAP '01.

[4]  Alok N. Choudhary,et al.  An infrastructure for scalable parallel multidimensional analysis , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[5]  Gavin McCance Grid Enabled Relational Database Middleware , 2001 .

[6]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[7]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[8]  Peter Thanisch,et al.  Constructing OLAP cubes based on queries , 2001, DOLAP '01.

[9]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[10]  Randy Goebel,et al.  Towards a Novel OLAP Interface for Distributed Data Warehouses , 2001, DaWaK.

[11]  Jeffrey F. Naughton,et al.  Adaptive parallel aggregation algorithms , 1995, SIGMOD '95.

[12]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[13]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[14]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[15]  Mike Gunderloy,et al.  SQL Server Developer's Guide to Olap with Analysis Services (Developer's Handbook Series) , 2001 .

[16]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[17]  Torben Bach Pedersen,et al.  Specifying OLAP Cubes on XML Data , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[18]  Andrew Rau-Chaplin,et al.  Computing Partial Data Cubes for Parallel Data Warehousing Applications , 2001, PVM/MPI.

[19]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[20]  Elliotte Rusty Harold,et al.  XML in a Nutshell , 2001 .

[21]  Maria E. Orlowska,et al.  Computing multidimensional aggregates in parallel , 1998, Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250).

[22]  Peter Thanisch,et al.  Applying dependency theory to conceptual modelling , 2000 .