Simulation-Aided Performance Evaluation of Server-Side Input/Output Optimizations

The performance of parallel distributed file systems suffers from many clients executing a large number of operations in parallel, because the I/O subsystem can be easily overwhelmed by the sheer amount of incoming I/O operations. Many optimizations exist that try to alleviate this problem. Client-side optimizations perform preprocessing to minimize the amount of work the file servers have to do. Server-side optimizations use server-internal knowledge to improve performance. The HD Trace framework contains components to simulate, trace and visualize applications. It is used as a test bed to evaluate optimizations that could later be implemented in real-life projects. This paper compares existing client-side optimizations and newly implemented server-side optimizations and evaluates their usefulness for I/O patterns commonly found in HPC. Server-directed I/O chooses the order of non-contiguous I/O operations and tries to aggregate as many operations as possible to decrease the load on the I/O subsystem and improve overall performance. The results show that server-side optimizations beat client-side optimizations in terms of performance for many use cases. Integrating such optimizations into parallel distributed file systems could alleviate the need for sophisticated client-side optimizations. Due to their additional knowledge of internal workflows server-side optimizations may be better suited to provide high performance in general.

[1]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[2]  Rajeev Thakur,et al.  Optimizing noncontiguous accesses in MPI-IO , 2002, Parallel Comput..

[3]  Michael Kühn Simulation-Aided Performance Evaluation of Input/Output Optimizations for Distributed Systems , 2009 .

[4]  Jesús Carretero,et al.  Multiple-Phase Collective I/O Technique for Improving Data Access Locality , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).

[5]  Thomas Ludwig,et al.  Optimizations for Two-Phase Collective I/O , 2011, PARCO.

[6]  Robert B. Ross,et al.  Noncontiguous I/O accesses through MPI-IO , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[7]  Jesús Carretero,et al.  A collective I/O implementation based on inspector–executor paradigm , 2008, The Journal of Supercomputing.

[8]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.