SOR: A Static File Assignment Strategy Immune to Workload Characteristic Assumptions in Parallel I/O Systems

The problem of statically assigning nonpartitioned files in a parallel I/O system has been extensively investigated. A basic workload characteristic assumption of existing solutions to the problem is that there exists a strong inverse correlation between file access frequency and file size. In other words, the most popular files are typically small in size, while the large files are relatively unpopular. Recent studies on the characteristics of web proxy traces suggested, however, the correlation, if any, is so weak that it can be ignored. Hence, the following two questions arise naturally. First, can existing algorithms still perform well when the workload assumption does not hold? Second, if not, can one develop a new file assignment strategy that is immune to the workload assumption? To answer these questions, in this paper we first evaluate the performance of three well-known file assignment algorithms with and without the workload assumption, respectively. Next, we develop a novel static file assignment strategy for parallel I/O systems, called static round-robin (SOR), which is immune to the workload assumption. Comprehensive experimental results show that SOR consistently and noticeably improves the performance in terms of mean response time over the existing schemes.

[1]  Lawrence W. Dowdy,et al.  Comparative Models of the File Assignment Problem , 1982, CSUR.

[2]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[3]  Paolo Merialdo,et al.  Design and development of data-intensive web sites: The Araneus approach , 2003, TOIT.

[4]  Daniel A. Reed,et al.  NCSA's World Wide Web Server: Design and Performance , 1995, Computer.

[5]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[6]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[8]  Magnus Karlsson,et al.  Choosing replica placement heuristics for wide-area systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[9]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[10]  Cyrus Shahabi,et al.  On Disk Scheduling and Data Placement for Video Servers , 1995 .

[11]  Claude Kaiser,et al.  Distributed computing systems , 1986 .

[12]  Wednesday September,et al.  2007 International Conference on Parallel Processing , 2007 .

[13]  Jussi Kangasharju,et al.  Object replication strategies in content distribution networks , 2002, Comput. Commun..

[14]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[15]  Ricardo Bianchini,et al.  Conserving disk energy in network servers , 2003, ICS '03.

[16]  Sven Buchholz,et al.  Replica placement in adaptive content distribution networks , 2004, SAC '04.

[17]  Stavros Christodoulakis,et al.  Optimal Data Placement on Disks: A Comprehensive Solution for Different Technologies , 2000, IEEE Trans. Knowl. Data Eng..

[18]  Eric Anderson,et al.  Quickly finding near-optimal storage designs , 2005, TOCS.

[19]  Wesley W. Chu,et al.  Optimal File Allocation in a Multiple Computer System , 1969, IEEE Transactions on Computers.

[20]  Gerhard Weikum,et al.  Data partitioning and load balancing in parallel disk systems , 1998, The VLDB Journal.

[21]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[22]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[23]  Jianliang Xu,et al.  On replica placement for QoS-aware content distribution , 2004, IEEE INFOCOM 2004.

[24]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[25]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[26]  Peter Scheuermann,et al.  File Assignment in Parallel I/O Systems with Minimal Variance of Service Time , 2000, IEEE Trans. Computers.

[27]  Ishfaq Ahmad,et al.  Static and adaptive data replication algorithms for fast information access in large distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[28]  Krishna R. Pattipati,et al.  A file assignment problem model for extended local area network environments , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[29]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[30]  Alan Jay Smith,et al.  The automatic improvement of locality in storage systems , 2005, TOCS.

[31]  Chita R. Das,et al.  Adaptive block rearrangement algorithms for video-on-demand server , 2001, International Conference on Parallel Processing, 2001..

[32]  Nabil R. Adam,et al.  Distributed file allocation with consistency constraints , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[33]  Ishfaq Ahmad,et al.  Continuous Replica Placement schemes in distributed systems , 2005, ICS '05.

[34]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[35]  Hiroshi Tsuji,et al.  Memory-Based Architecture for Distributed WWW Caching Proxy , 1998, Comput. Networks.