File Assignment in Parallel I/O Systems with Minimal Variance of Service Time

We address the problem of assigning nonpartitioned files in a parallel I/O system where the file accesses exhibit Poisson arrival rates and fixed service times. We present two new file assignment algorithms based on open queuing networks which aim at minimizing simultaneously the load balance across all disks, as well as the variance of the service time at each disk. We first present an off-line algorithm, Sort Partition, which assigns to each disk file with similar access time. Next, we show that, assuming that a perfectly balanced file assignment can be found for a given set of files, Sort Partition will find the one with minimal mean response time. We then present an on-line algorithm, Hybrid Partition, that assigns groups of files with similar service times in successive intervals while guaranteeing that the load imbalance at any point does not exceed a certain threshold. We report on synthetic experiments which exhibit skew in file accesses and sizes and we compare the performance of our new algorithms with the vanilla greedy file allocation algorithm.

[1]  L WolfJoel,et al.  A Parallel Hash Join Algorithm for Managing Data Skew , 1993 .

[2]  Sangkyu Rho,et al.  Allocating Data and Operations to Nodes in Distributed Database Design , 1995, IEEE Trans. Knowl. Data Eng..

[3]  Amos Fiat,et al.  New algorithms for an ancient scheduling problem , 1992, STOC '92.

[4]  Robert E. McGrath,et al.  User access patterns to NCSA''s World Wide Web server , 1995 .

[5]  Benjamin W. Wah File Placement on Distributed Computer Systems , 1984, Computer.

[6]  Gary A. Bundell,et al.  Disk Cooling in Parallel Disk Systems , 1994 .

[7]  Heeseok Lee,et al.  Allocating data and workload among multiple servers in a local area network , 1995, Inf. Syst..

[8]  Rajiv M. Dewan,et al.  Models for the Combined Logical and Physical Design of Databases , 1989, IEEE Trans. Computers.

[9]  Gerhard Weikum,et al.  Database Reorganization in Parallel Disk Arrays with I/O Service Stealing , 1998, IEEE Trans. Knowl. Data Eng..

[10]  Arie Segev,et al.  Data Allocation for Multi-Disk Databases , 1993, IEEE Trans. Knowl. Data Eng..

[11]  TowsleyDon,et al.  Supporting stored video , 1998 .

[12]  Rahul Simha,et al.  A Microeconomic Approach to Optimal Resource Allocation in Distributed Computer Systems , 1989, IEEE Trans. Computers.

[13]  Joel L. Wolf,et al.  The placement optimization program: a practical solution to the disk file assignment problem , 1989, SIGMETRICS '89.

[14]  Donald Ervin Knuth,et al.  The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information , 1978 .

[15]  Gerhard Weikum,et al.  Data partitioning and load balancing in parallel disk systems , 1998, The VLDB Journal.

[16]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[17]  Edward G. Coffman,et al.  An Application of Bin-Packing to Multiprocessor Scheduling , 1978, SIAM J. Comput..

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[19]  T MarchSalvatore,et al.  Allocating Data and Operations to Nodes in Distributed Database Design , 1995 .

[20]  Yossi Azar,et al.  On-Line Load Balancing , 1994, Theor. Comput. Sci..

[21]  Kien A. Hua,et al.  A Self-Adjusting Data Distribution Mechanism for Multidimensional Load Balancing in Multiprocessor-Based Database Systems , 1994, Inf. Syst..

[22]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[23]  Lawrence W. Dowdy,et al.  Comparative Models of the File Assignment Problem , 1982, CSUR.

[24]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[25]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[26]  Daniel A. Reed,et al.  NCSA's World Wide Web Server: Design and Performance , 1995, Computer.

[27]  John Kunze,et al.  A trace-driven analysis of the unix 4 , 1985, SOSP 1985.

[28]  David R. Karger,et al.  A better algorithm for an ancient scheduling problem , 1994, SODA '94.

[29]  Krishna R. Pattipati,et al.  A file assignment problem model for extended local area network environments , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[30]  Philip S. Yu,et al.  A Parallel Hash Join Algorithm for Managing Data Skew , 1993, IEEE Trans. Parallel Distributed Syst..

[31]  E. G. Coffman,et al.  A Note on Expected Makespans for Largest-First Sequences of Independent Tasks on Two Processors , 1984, Math. Oper. Res..

[32]  Donald F. Towsley,et al.  Supporting stored video: reducing rate variability and end-to-end resource requirements through optimal smoothing , 1996, SIGMETRICS '96.

[33]  György Turán,et al.  On the performance of on-line algorithms for partition problems , 1989, Acta Cybern..