I/O Scheduling Service for Multi-Application Clusters

Distributed applications, especially the ones being I/O intensive, often access the storage subsystem in a nonsequential way (stride requests). Since such behaviours lower the overall system performance, many applications use parallel I/O libraries such as ROMIO to gather and reorder requests. In the meantime, as cluster usage grows, several applications are often executed concurrently, competing for access to storage subsystems and, thus, potentially canceling optimisations brought by parallel I/O libraries. The aIOLi project aims at optimising the I/O accesses within the cluster and providing a simple POSIX API. This article presents an extension of aIOLi to address the issue of disjoint accesses generated by different concurrent applications in a cluster. In such a context, good trade-off has to be assessed between performance, fairness and response time. To achieve this, an I/O scheduling algorithm together with a "requests aggregator" that considers both application access patterns and global system load, have been designed and merged into aIOLi This improvement led to the implementation of a new generic framework pluggable into any I/O file system layer. A test composed of two concurrent IOR benchmarks has shown improvements on read accesses by a factor ranging from 3.5 to 35 with POSIX calls and from 3.3 to 5 with ROMIO, both reference benchmarks have been executed on a traditional NFS server without any additional optimisations

[1]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[2]  Uwe Schwiegelshohn,et al.  Parallel Job Scheduling - A Status Report , 2004, JSSPP.

[3]  Nikhil Bansal,et al.  Algorithms for flow time scheduling , 2003 .

[4]  Shikharesh Majumdar,et al.  Performance of parallel I/O scheduling strategies on a network of workstations , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[5]  Joel H. Saltz,et al.  Tuning the performance of I/O-intensive parallel applications , 1996, IOPADS '96.

[6]  Kirk Pruhs,et al.  Online scheduling , 2003 .

[7]  Andrew A. Chien,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, SC.

[8]  Marianne Winslett,et al.  Server-Directed Collective I/O in Panda , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[9]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[10]  Wei-keng Liao,et al.  Collective caching: application-aware client-side file caching , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[11]  Florin Isaila,et al.  Integrating collective I/O and cooperative caching into the "clusterfile" parallel file system , 2004, ICS '04.

[12]  Christopher Small,et al.  Why does file system prefetching work? , 1999, USENIX Annual Technical Conference, General Track.

[13]  Adrien Lebre,et al.  Controlling and Scheduling Parallel I/O in Multi-application Environments , 2005 .

[14]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[15]  Margo I. Seltzer,et al.  NFS Tricks and Benchmarking Traps , 2003, USENIX Annual Technical Conference, FREENIX Track.

[16]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[17]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[18]  Marianne Winslett,et al.  Improving MPI-IO output performance with active buffering plus threads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[19]  Mahmut T. Kandemir,et al.  Discretionary Caching for I/O on Clusters , 2006, Cluster Computing.

[20]  Anna R. Karlin,et al.  Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling , 1996, TOCS.

[21]  Mahmut T. Kandemir,et al.  Discretionary Caching for I/O on Clusters , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[22]  Rolf Rabenseifner,et al.  The Parallel Effective I/O Bandwidth Benchmark: b eff io. Parallel I , 2001 .

[23]  Margo I. Seltzer,et al.  Structure and Performance of the Direct Access File System , 2002, USENIX ATC, General Track.

[24]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[25]  Robert B. Ross,et al.  REACTIVE SCHEDULING FOR PARALLEL I/O SYSTEMS , 2000 .

[26]  Ravi Jain,et al.  Heuristics for Scheduling I/O Operations , 1997, IEEE Trans. Parallel Distributed Syst..

[27]  Adrien Lèbre,et al.  2005 Ieee International Symposium on Cluster Computing and the Grid Aloli: an Input/output Library for Cluster of Smp * , 2022 .

[28]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[29]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.