A model for distributing and querying a data warehouse on a computing grid

Data warehouses store large volumes of data according to a multidimensional model with dimensions representing different axes of analysis. OLAP systems (online analytical processing) provide the ability to interactively explore the data warehouse. Rising volumes and complexity of data favor the use of more powerful distributed computing architectures. Computing grids in particular are built for decentralized management of heterogeneous distributed resources. Their lack of centralized control however conflicts with classic centralized data warehouse models. To take advantage of a computing grid infrastructure to operate a data warehouse, several problems need to be solved. First, the warehouse data must be uniquely identified and judiciously partitioned to allow efficient distribution, querying and exchange among the nodes of the grid. We propose a data model based on "chunks" as atomic entities of warehouse data that can be uniquely identified. We then build contiguous blocks of these chunks to obtain suitable fragments of the data warehouse. The fragments stored on each grid node must be indexed in a uniform way to effectively interact with existing grid services. Our indexing structure consists of a lattice structure mapping queries to warehouse fragments and a specialized spatial index structure formed by X-trees providing the information necessary for optimized query evaluation plans.

[1]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[2]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[3]  Jim Smith,et al.  Distributed Query Processing on the Grid , 2003, Int. J. High Perform. Comput. Appl..

[4]  Jeffrey F. Naughton,et al.  Aggregate Aware Caching for Multi-Dimensional Queries , 2000, EDBT.

[5]  Wolfgang Hoschek,et al.  A Unified Peer-to-Peer Database Framework for Scalable Service and Resource Discovery , 2002, GRID.

[6]  Werner Nutt,et al.  R-GMA: An Information Integration System for Grid Monitoring , 2003, OTM.

[7]  Laks V. S. Lakshmanan,et al.  Efficient OLAP query processing in distributed data warehouses , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[10]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[11]  Mukesh K. Mohania,et al.  OLAP query processing for partitioned data warehouses , 1999, Proceedings 1999 International Symposium on Database Applications in Non-Traditional Environments (DANTE'99) (Cat. No.PR00496).

[12]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[13]  Bharat K. Bhargava,et al.  PartJoin: An Efficient Storage and Query Execution for Data Warehouses , 2002, DaWaK.

[14]  Lionel Brunie,et al.  Information grids: managing and mining semantic data in a grid infrastructure; open issues and application to geno-medical data , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[15]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[16]  Laks V. S. Lakshmanan,et al.  Efficient OLAP Query Processing in Distributed Data Warehouses , 2002, EDBT.