Data for decision oriented technologies, like data warehousing and On-Line Analytical Processing (OLAP) systems, are ever increasing. As they store and handle data of historical nature, the volume of data involved could be very large, requiring more efficient ways of dealing with the same. Recent advances in parallel computing and high-speed networks using a cluster of PCs or workstations (COWs) offer a solution for providing this scale up in performance by parallelism of data, and its processing, in the data warehouse. However, there are issues peculiar to clusters that first need to be considered. One of the important issues in the case of a non-dedicated cluster is to adapt itself to the changing environment due to its usage. This paper looks at some of the adaptation strategies that could be applied for data warehousing and OLAP systems implemented on a COW. 1. I N T R O D U C T I O N New decision support technologies such as, data warehousing and on-l ine analytical processing (OLAP) have seen explos ive growth in their usage in recent years [5]. A data warehouse can be defined as an online repository of historical enterprise data that is aimed at enabl ing the knowledge worker (executive, manager, analyst) make better and faster decis ions [4, 6]. There are, mainly, two reasons for using a data warehouse, viz. management of historical data and better performance. While the operational database is tuned to handle the day to day operations of the organisation, the data warehouse is geared to facilitate the management ' s ad-hoc queries for decision-making. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies arc not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or fee, SAC'00 March 19-21 Como, Italy (c) 2000 ACM 1-58113-239-5/00/003>...>$5.00 Often, knowledge workers need complex analyses and visualisat ion of data. For this purpose, the data in a warehouse is typically modeled multidimensionally [10]. For example, in a product sales data warehouse, date, product code, region, salesperson and customer might be some of the dimensions of interest, while detailed information regarding each sale (total price, quantity etc.) is recorded in a fact table. Figure 1 shows an example of such a database schema, called a star schema. Some of these dimensions may be hierarchical. For example, date of sale may be organised as a month-quarter-year hierarchy. The operat ions of OLAP allow users to efficiently retrieve data from the data warehouse and are quite different to those of on-line transaction processing (OLTP). Typical OLAP operations include rollup (increasing the level of aggregation; for example, sales for a whole year rather than by quarters) and drill-down (decreasing the level of aggregation or increasing the detail) along one or more dimension hierarchies, slice-and-dice (selection and projection) and pivot (re-orienting the mult idimensional viewing of data) [10]. Due to these differences in requirements of the nature of applications and operations on a data warehouse, the typical operational database would not be able to provide an acceptable level of performance for OLAP queries. Also, the nature of data required for decision support may be missing from such a database (for example, to show trends in sales for a certain region or city over the years we need historical data which is not available in a typical operational database). As data in a data warehouse keeps on increasing so does the need for faster methods of accessing and processing the data. However, even if we increase the speed of the processor (CPU) the steep growth of large data warehouses, will eventually surpass the capability based on the available processing speed. This forces us to look into alternative methods in speeding up the processing of OLAP and other types of queries on a data warehouse. Faster processors and more efficient algorithms of query evaluation can help, but only to a certain extent. A number of techniques [11, 15, 16] have been suggested by the researchers to speed up the query processing in OLAP. Some of these techniques
[1]
Gregory F. Pfister,et al.
In Search of Clusters
,
1995
.
[2]
Yue Zhuge,et al.
Distributed and parallel computing issues in data warehousing (abstract)
,
1998,
PODC '98.
[3]
Joel H. Saltz,et al.
The utility of exploiting idle workstations for parallel computation
,
1997,
SIGMETRICS '97.
[4]
Jack Dongarra,et al.
MPI: The Complete Reference
,
1996
.
[5]
Thomas R. Gross,et al.
Transparent adaptive parallelism on NOWs using OpenMP
,
1999,
PPoPP '99.
[6]
Alok N. Choudhary,et al.
Design and implementation of a scalable parallel system for multidimensional analysis and OLAP
,
1999,
Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.
[7]
Jehoshua Bruck,et al.
Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations
,
1997,
J. Parallel Distributed Comput..
[8]
Yue Zhuge,et al.
Distributed and Parallel Computing Issues in Data Warehousing (Invited Talk)
,
1998
.
[9]
Jeffrey F. Naughton,et al.
Simultaneous optimization and evaluation of multiple dimensional queries
,
1998,
SIGMOD '98.
[10]
Inderpal Singh Mumick,et al.
Maintenance of data cubes and summary tables in a warehouse
,
1997,
SIGMOD '97.
[11]
Rajkumar Buyya.
Cluster Computing : The Commodity Supercomputing
,
1988
.
[12]
Surajit Chaudhuri,et al.
An overview of data warehousing and OLAP technology
,
1997,
SGMD.
[13]
Bongki Moon,et al.
A case for parallelism in data warehousing and OLAP
,
1998,
Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).
[14]
Jack Dongarra,et al.
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
,
1995
.
[15]
Rajkumar Buyya,et al.
Cluster computing: the commodity supercomputer
,
1999
.
[16]
Peter M. G. Apers,et al.
Parallel evaluation of multi-join queries
,
1995,
SIGMOD '95.