Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services

A parallel system relies on both process scheduling and I/O scheduling for efficient use of resources, and a program's performance hinges on the resource on which it is bottlenecked. Existing process schedulers and I/O schedulers are independent. However, when the bottleneck is I/O, there is an opportunity to alleviate it via cooperation between the I/O and process schedulers: the service efficiency of I/O requests can be highly dependent on their issuance order, which in turn is heavily influenced by process scheduling. We propose a data-driven program execution mode in which process scheduling and request issuance are coordinated to facilitate effective I/O scheduling for high disk efficiency. Our implementation, Dual Par, uses process suspension and resumption, as well as pre-execution and prefetching techniques, to provide a pool of pre-sorted requests to the I/O scheduler. This data-driven execution mode is enabled when I/O is detected to be the bottleneck, otherwise the program runs in the normal computation-driven mode. Dual Par is implemented in the MPICH2 MPI-IO library for MPI programs to coordinate I/O service and process execution. Our experiments on a 120-node cluster using the PVFS2 file system show that Dual Par can increase system I/O throughput by 31% on average, compared to existing MPI-IO with or without using collective I/O.

[1]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[2]  Fay W. Chang,et al.  Operating System I/O Speculation: How Two Invocations Are Faster Than One , 2003, USENIX Annual Technical Conference, General Track.

[3]  Song Jiang,et al.  InterferenceRemoval: removing interference of disk access for MPI programs through data replication , 2010, ICS '10.

[4]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  孝蔵 藤井,et al.  NASA Ames Research Centerにおける数値流体力学研究 , 1985 .

[6]  David C. Steere,et al.  Exploiting the non-determinism and asynchrony of set iterators to reduce aggregate file I/O latency , 1997, SOSP.

[7]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[8]  Garth A. Gibson,et al.  Using speculative execution to automatically hide i/o latency , 2002 .

[9]  Robert B. Ross,et al.  Noncontiguous I/O accesses through MPI-IO , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Surendra Byna,et al.  Hiding I/O latency with pre-execution prefetching for parallel applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Ruppa K. Thulasiram,et al.  High Performance Computing for a Financial Application Using Fast Fourier Transform , 2005, Euro-Par.

[13]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[14]  Peter Druschel,et al.  Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[15]  Song Jiang,et al.  iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[16]  E. H. Kung,et al.  A PDF method for multidimensional modeling of HCCI engine combustion: effects of turbulence/chemistry interactions on ignition timing and emissions , 2005 .

[17]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Song Jiang,et al.  Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[19]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[20]  Martín Abadi,et al.  AC: composable asynchronous IO for native languages , 2011, OOPSLA '11.

[21]  Garth Gibson,et al.  Automatic generation of I/O prefetching hints through speculative execution (poster session) , 2000, OPSR.

[22]  Wei-keng Liao,et al.  Evaluating I/O characteristics and methods for storing structured scientific data , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[23]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[24]  M. Crawford The Human Genome Project. , 1990, Human biology.

[25]  Song Jiang,et al.  IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.