The OLAP-Enabled Grid: Model and Query Processing Algorithms

The operation of modern distributed enterprises, be they commercial, scientific, or health related, generate massive quantities of data. Decision makers increasingly utilize On- Line Analytical Processing (OLAP) tools to glean from this rich data resource nuggets of information which can be used to better run their enterprises. A typical approach to OLAP is to construct a single centralized data repository by copying all of the raw data from the sites where it is generated to a cental location, where it is integrated, and then to route all queries to that central location. As the amount of data and number of sites and users grows this approach suffers from significant scalability problems. In this paper, we present a model and algorithmic framework for an "OLAP-Enabled Grid" whose goal is the efficient support of OLAP operations. We show how a Grid computing infrastructure can be used to store and manage expensive to compute data aggregations and to answer OLAP queries in a fully distributed manner. Our focus is on the efficient optimization of resources for answering queries based on a distributed query algorithm which uses cached and pre-aggregated data stored over a Grid computing infrastructure.

[1]  Fabrizio Silvestri,et al.  Scheduling High Performance Data Mining Tasks on a Data Grid Environment , 2002, Euro-Par.

[2]  Toby J. Teorey,et al.  A progressive view materialization algorithm , 1999, DOLAP '99.

[3]  Peter Thanisch,et al.  Constructing an OLAP cube from distributed XML data , 2002, DOLAP '02.

[4]  Peter Thanisch,et al.  Applying Grid Technologies to XML Based OLAP Cube Construction , 2003, DMDW.

[5]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[6]  Andrew Rau-Chaplin,et al.  Parallel multi-dimensional ROLAP indexing , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[7]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[8]  Andrew Rau-Chaplin,et al.  The cgmCUBE project: Optimizing parallel data cube generation for ROLAP , 2006, Distributed and Parallel Databases.

[9]  Xin Yao,et al.  An evolutionary approach to materialized views selection in a data warehouse environment , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[10]  Erich Schikuta,et al.  Towards a cost model for distributed and replicated data stores , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.

[11]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[12]  Panos Kalnis,et al.  View selection using randomized search , 2002, Data Knowl. Eng..

[13]  Jeffrey F. Naughton,et al.  Aggregate Aware Caching for Multi-Dimensional Queries , 2000, EDBT.

[14]  Sang-Min Park,et al.  Chameleon: a resource scheduler in a data grid environment , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[15]  Maria E. Orlowska,et al.  Materialized view selection under the maintenance time constraint , 2001, Data Knowl. Eng..

[16]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[17]  Peter Brezany,et al.  On-line analytical processing on large databases managed by computational grids , 2004 .

[18]  Ying Chen,et al.  Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[19]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize in a Data Warehouse , 2005, IEEE Trans. Knowl. Data Eng..

[20]  Dan Suciu,et al.  What Can Database Do for Peer-to-Peer? , 2001, WebDB.

[21]  Bernd Schuller,et al.  Grid-enabled data warehousing for molecular engineering , 2004, Parallel Comput..