iBridge: Improving Unaligned Parallel File Access with Solid-State Drives

When files are striped in a parallel I/O system, requests to the files are decomposed into a number of sub-requests that are distributed over multiple servers. If a request is not aligned with the striping pattern such decomposition can make the first and last sub-requests much smaller than the striping unit. Because hard-disk-based servers can be much less efficient in serving small requests than large ones, the system exhibits heterogeneity in serving sub-requests of different sizes, and the net throughput of the entire system can be severely degraded by the inefficiency of serving the smaller requests, or fragments. Because a request is not considered complete until its slowest sub-request is, the penalty is yet greater for synchronous requests. To make the situation even worse, the larger the request, or the more data servers the requested data is striped over, the larger the detrimental performance effect of serving fragments can be. This effect can become the Achilles' heel of a parallel I/O system performance seeking scalability with large sequential accesses. In this paper we propose iBridge, a scheme that uses solid-state drives to serve request fragments and thereby bridge the performance gap between serving fragments and serving large sub-requests. We have implemented iBridge in the PVFS file system. Our experimental results with representative MPI-IO benchmarks show that iBridge can significantly improve the I/O throughput of storage systems, especially for large requests with fragments.

[1]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[2]  Robert Latham,et al.  I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3]  Song Jiang,et al.  iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[4]  H. Lee Ward A brief parallel I/O tutorial. , 2010 .

[5]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[6]  Song Jiang,et al.  iHarmonizer: Improving the Disk Efficiency of I/O-intensive Multithreaded Codes , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[7]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[8]  Surendra Byna,et al.  Hiding I/O latency with pre-execution prefetching for parallel applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Feng Chen,et al.  Hystor: making the best use of solid state drives in high performance storage systems , 2011, ICS '11.

[11]  Song Jiang,et al.  InterferenceRemoval: removing interference of disk access for MPI programs through data replication , 2010, ICS '10.

[12]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[13]  Song Jiang,et al.  Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[14]  Steven Swanson,et al.  Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications , 2009, ASPLOS.

[15]  Mahmut T. Kandemir,et al.  Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines , 2011, 2011 31st International Conference on Distributed Computing Systems.

[16]  David R. Kaeli,et al.  Profile-guided I/O partitioning , 2003, ICS '03.

[17]  C. Law,et al.  Direct Numerical Simulations of Turbulent Lean Premixed Combustion. , 2006 .

[18]  Chao Wang,et al.  NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[19]  Wei-keng Liao,et al.  Collective caching: application-aware client-side file caching , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[20]  Robert Ross,et al.  PVFS: parallel virtual file system , 2001 .

[21]  Marianne Winslett,et al.  High-level buffering for hiding periodic output cost in scientific simulations , 2006, IEEE Transactions on Parallel and Distributed Systems.

[22]  Bradley W. Settlemyer,et al.  A study of client-based caching for parallel i/o , 2009 .