An Improved Discrepancy Approach to Declustering

The last decade saw dramatic improvements in computer processing speed and storage capacities. Nowadays, the bottleneck in data-intensive applications is disk I/O, the time needed to retrieve typically large amount of data from storage devices. One idea to overcome this obstacle is to spread the data on disks of multi-disk systems so that they can be retrieved in parallel. The data allocation is determined by declustering schemes. Their aim is to allocate the data in such a manner that typical requests find their data evenly distributed on the disks. The declustering problem is to assign data blocks from a multi-dimensional grid system to one of M storage devices in a balanced manner. More precisely, our grid is V = [n1] × · · · × [nd] for some positive integers n1, . . . , nd. 5 A query Q requests the data assigned to a sub-grid [x1..y1] × · · · × [xd..yd] for some integers 1 ≤ xi ≤ yi ≤ ni. We assume that the time to process such