Defining data placement and allocation in the disk subsystem can have a significant impact on data warehouse performance. However, our experiences with data warehouse implementations show that the database storage layout is often subject to vague or even invalid assumptions about I/O performance trade-offs. Clear guidelines for the assignment of database objects to disks are a very common request from data warehouse DBAs and consultants. We review best practices suggested by storage and database vendors, and present two sets of performance measurements that compare storage layout alternatives and their implications. The first set used a TPC-H benchmark workload with DB2 UDB, the other a star schema/star join scenario with IBM Red Brick Data Warehouse.
[1]
Gerhard Weikum,et al.
Data partitioning and load balancing in parallel disk systems
,
1998,
The VLDB Journal.
[2]
Erhard Rahm,et al.
Multi-Dimensional Database Allocation for Parallel Data Warehouses
,
2000,
VLDB.
[3]
Matthias Nicola,et al.
Storage Layout and I/O Performance Tuning for IBM® Red Brick® Data Warehouse
,
2002
.
[4]
Garth A. Gibson,et al.
RAID: high-performance, reliable secondary storage
,
1994,
CSUR.
[5]
Juan Loaiza.
OPTIMAL STORAGE CONFIGURATION MADE EASY
,
2000
.
[6]
David J. DeWitt,et al.
Data placement in shared-nothing parallel database systems
,
1997,
The VLDB Journal.