Embracing diversity : improving performance for parallel storage systems built with heterogeneous disks

Components in parallel systems are heterogeneous, and over time, the degree of heterogeneity often increases as components are replaced or added. This study examines how system throughput is affected when different block allocation schemes are used for parallel storage systems that are constructed with heterogeneous disks. In this dissertation, we investigate the nonlinear performance properties of disks induced by their mechanical nature and zone-bit recording. We describe a model that captures the underlying behavior of heterogeneous disks in a parallel file system under different workloads and we show that the model predicts the actual behavior of a set of heterogeneous disks. We then compare the impact that five different allocation algorithms have on system response time. We show that taking zone-bit recording into consideration can significantly reduce system response time. We also show that taking the client workload into consideration can further reduce system response time. Lastly, we built a prototype to test the insights developed here. The prototype allowed us to run generic programs on a commodity cluster that utilizes a parallel storage system that could be easily reconfigured using any number of disks. We inserted four types of disks with different capacities and performance characteristics and ran three programs on the system: a parallel sequential micro-benchmark, a parallel random micro-benchmark and a parallel scientific application. Three block allocation schemes were examined on the prototype: "uniform'' (all blocks are uniformly distributed among the drives), "capacity'' (blocks are distributed based on the relative size of each drive), and "adaptive'' (blocks are distributed based on the relative performance of each drive). For all the configurations we tested, "adaptive'' significantly outperforms a state-of-the-art parallel file system (PVFS2) by 60% - 147% on the parallel random micro-benchmark and by 140% - 355% on the parallel sequential micro-benchmark. In all but one configuration, "adaptive'' outperforms "capacity'' and "uniform''. We provide insight to why "adaptive'' was slightly slower than "capacity'' in this one configuration.

[1]  Gregory R. Ganger,et al.  Object-based storage , 2003, IEEE Commun. Mag..

[2]  Philip M. Papadopoulos,et al.  NPACI: rocks: tools and techniques for easily deploying manageable Linux clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[3]  Roger Zimmermann,et al.  Zoned-RAID for Multimedia Database Servers , 2005, DASFAA.

[4]  Toni Cortes,et al.  Evaluating the Effects of Upgrading Heterogeneous Disk Arrays , 2006 .

[5]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[6]  Luis M. Bernardo,et al.  Data Replication and Delay Balancing in Heterogeneous Disk Systems , 1998, WDAS.

[7]  Richard Wolski,et al.  Predicting bounds on queuing delay for batch-scheduled parallel machines , 2006, PPoPP '06.

[8]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[9]  Kai Shen,et al.  Competitive prefetching for concurrent sequential I/O , 2007, EuroSys '07.

[10]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[11]  Hyojun Kim,et al.  BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage , 2008, FAST.

[12]  Shahram Ghandeharizadeh,et al.  Highly available and heterogeneous continuous media storage systems , 2004, IEEE Transactions on Multimedia.

[13]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[14]  Jose Renato Santos,et al.  Performance analysis of the RIO multimedia storage system with heterogeneous disk configurations , 1998, MULTIMEDIA '98.

[15]  Jose Renato Santosy,et al.  Using Heterogeneous Disks on a Multimedia Storage System withRandom Data Allocation , 1998 .

[16]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[17]  Garth A. Gibson,et al.  A Case for Network-Attached Secure Disks, , 1996 .

[18]  M. Morris,et al.  The Design , 1998 .

[19]  Margo I. Seltzer,et al.  NFS Tricks and Benchmarking Traps , 2003, USENIX Annual Technical Conference, FREENIX Track.

[20]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[21]  Doron Rotem,et al.  Declustering Databases on Heterogeneous Disk Systems , 1995, VLDB.

[22]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[23]  Walter B. Ligon,et al.  Next generation parallel virtual file system , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[24]  Don DeSota Characterization of I/O for TPC-C and TPC-H workloads , 2001 .

[25]  Djamshid Tavangarian,et al.  The PRIOmark Parallel I/O-Benchmark , 2005, Parallel and Distributed Computing and Networks.

[26]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[27]  Babak Falsafi,et al.  DBmbench: fast and accurate database workload representation on modern microarchitecture , 2005, CASCON.

[28]  Jonathan Pevsner,et al.  Basic Local Alignment Search Tool (BLAST) , 2005 .

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[30]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[31]  Marianne Winslett,et al.  Declustering large multidimensional data sets for range queries over heterogeneous disks , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[32]  Jesús Labarta,et al.  Taking advantage of heterogeneity in disk arrays , 2003, J. Parallel Distributed Comput..

[33]  K. Vaidyanathan,et al.  Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand , 2005 .

[34]  E. Grochowski,et al.  Future trends in hard disk drives , 1996 .

[35]  Marianne Winslett,et al.  Parallel I/O for scientific applications on heterogeneous clusters: a resource-utilization approach , 1999, ICS '99.

[36]  Robert D. Love,et al.  Linux System Programming , 2007 .

[37]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[38]  Xiao Qin,et al.  Dynamic Load Balancing for I/O-Intensive Tasks on Heterogeneous Clusters , 2003, HiPC.

[39]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[40]  Shahram Ghandeharizadeh,et al.  Continuous display using heterogeneous disk-subsystems , 1997, MULTIMEDIA '97.