Prefetching in segmented disk cache for multi-disk systems

This paper investigates the performance of a multi-disk storage system equipped with a segmented disk cache processing a workload of multiple relational scans. Prefetching is a popular method of improving the performance of scans. Many modern disks have a multisegment cache which can be used for prefetching. We observe that, exploiting declustering as a data placement method, prefetching in a segmented cache causes a load imbalance among several disks. A single disk becomes a bottleneck, degrading performance of the entire system. A variation in disk queue length is a primary factor of the imbalance. Using a precise simulation model, we investigate several approaches to achieving better balancing. Our metrics are a scan response time for the closed-end system and an ability to sustain a workload without saturating for the open-end system. We arrive at two main conclusions: (1) Prefetching in main memory is inexpensive and effective for balancing and can supplement or substitute prefetching in disk cache. (2) Disk-level prefetching provides about the same performance as main memory prefetching if request queues are managed in the disk controllers rather than in the host. Checking the disk cache before queuing requests provides not only better request response time but also drastically improves balancing. A single cache performs better than a segmented cache for this method.

[1]  David J. DeWitt,et al.  Parallel Database Systems: The Future of High Performance Database Processing 1 , 1992 .

[2]  James K. Archibald,et al.  Multiple Prefetch Adaptive Disk Caching , 1993, IEEE Trans. Knowl. Data Eng..

[3]  David J. DeWitt,et al.  Dynamic Memory Allocation for Multiple-Query Workloads , 1993, VLDB.

[4]  Miron Livny,et al.  Towards Automated Performance Tuning for Complex Workloads , 1994, VLDB.

[5]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[6]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[7]  Valery Soloviev,et al.  MPL-Adaptive Algorithms for Multisegmented Disk Caches , 1994 .

[8]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[9]  Hamid Pirahesh,et al.  Starburst Mid-Flight: As the Dust Clears , 1990, IEEE Trans. Knowl. Data Eng..

[10]  Erhard Rahm,et al.  Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems , 1993, VLDB.

[11]  Yale N. Patt,et al.  Scheduling algorithms for modern disk drives , 1994, SIGMETRICS 1994.

[12]  Goetz Graefe,et al.  Sort-merge-join: an idea whose time has(h) passed? , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[13]  Alan Jay Smith,et al.  Disk cache—miss ratio analysis and design considerations , 1983, TOCS.

[14]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[15]  Miron Livny,et al.  Multi-disk management algorithms , 1987, SIGMETRICS '87.

[16]  Joel L. Wolf,et al.  The placement optimization program: a practical solution to the disk file assignment problem , 1989, SIGMETRICS '89.

[17]  David J. DeWitt,et al.  Batch scheduling in parallel database systems , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[18]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[19]  Yale N. Patt,et al.  The I/O Subsystem - A Candidate for Improvement: Guest Editor's Introduction , 1994, Computer.

[20]  Philip H. Seaman,et al.  On Teleprocessing System Design Part IV: An Analysis of Auxiliary Storage Activity , 1966, IBM Syst. J..

[21]  Bruce McNutt I/0 Subsystem Configurations for ESA: New Roles for Processor Storage , 1993, IBM Syst. J..

[22]  Miron Livny,et al.  Managing Memory to Meet Multiclass Workload Response Time Goals , 1993, VLDB.

[23]  David T. Harper,et al.  Performance analysis of disk cache write policies , 1995, Microprocess. Microsystems.

[24]  Y.N. Patt,et al.  The I/O subsystem/spl minus/a candidate for improvement , 1994, Computer.

[25]  David J. DeWitt,et al.  Data placement in shared-nothing parallel database systems , 1997, The VLDB Journal.

[26]  Jeffrey F. Naughton,et al.  Using shared virtual memory for parallel join processing , 1993, SIGMOD '93.

[27]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[28]  Jim Gray,et al.  The convoy phenomenon , 1979, OPSR.

[29]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[30]  J DeWittDavid,et al.  Data placement in shared-nothing parallel database systems , 1997, VLDB 1997.

[31]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.