Duality between Prefetching and Queued Writing with Parallel Disks

Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seemingly more difficult problem of prefetching. We first explore this duality for applications involving read-once accesses using parallel disks. We get a simple linear time algorithm for computing optimal prefetch schedules and analyze the efficiency of the resulting schedules for randomly placed data and for arbitrary interleaved accesses to striped sequences. Duality also provides an optimal schedule for the integrated caching and prefetching problem, in which blocks can be accessed multiple times. Another application of this duality gives us the first parallel disk sorting algorithms that are provably optimal up to lower order terms. One of these algorithms is a simple and practical variant of multiway merge sort, addressing a question that has been open for some time.

[1]  Jeffrey Scott Vitter,et al.  Deterministic distribution sort in shared and distributed memory multiprocessors , 1993, SPAA '93.

[2]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures , 1999, External Memory Algorithms.

[3]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[4]  Frank Dehne,et al.  Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms , 1997, SPAA '97.

[5]  Jeffrey Scott Vitter,et al.  Greed sort: optimal deterministic sorting on parallel disks , 1995, JACM.

[6]  Anil Maheshwari,et al.  Reducing I/O complexity by simulating coarse grained parallel algorithms , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[7]  Susanne Albers,et al.  On the Influence of Lookahead in Competitive Paging Algorithms , 1997, Algorithmica.

[8]  Jeffrey Scott Vitter,et al.  The power of duality for prefetching and sorting with parallel disks , 2001, SPAA '01.

[9]  Peter J. Varman,et al.  Optimal prefetching and caching for parallel I/O sytems , 2001, SPAA '01.

[10]  Jeffrey Scott Vitter,et al.  A Simple and Efficient Parallel Disk Mergesort , 1999, SPAA '99.

[11]  Susanne Albers,et al.  Minimizing stall time in single and parallel disk systems , 1998, STOC '98.

[12]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[13]  Jeffrey Scott Vitter,et al.  Distribution sort with randomized cycle , 2001, SODA '01.

[14]  Peter Sanders,et al.  Fast Concurrent Access to Parallel Disks , 2000, SODA '00.

[15]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[16]  Peter J. Varman,et al.  Optimal Read-once Parallel Disk Scheduling , 1999, IOPADS.

[17]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[18]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[19]  Anna R. Karlin,et al.  Near-Optimal Parallel Prefetching and Caching , 2000, SIAM J. Comput..

[20]  Peter J. Varman,et al.  PC-OPT: Optimal Offline Prefetching and Caching for Parallel I/O Systems , 2002, IEEE Trans. Computers.

[21]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[22]  Edward F. Grove,et al.  Simple Randomized Mergesort on Parallel Disks , 1997, Parallel Comput..

[23]  Anna R. Karlin,et al.  Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling , 1996, TOCS.