Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems

We present an application-level I/O caching, prefetching, asynchronous system to hide access latency experienced by HPC applications. Our solution of user controllable caching and prefetching system maintains a file-IO cache in the user space of the application, analyzes the I/O access patterns, prefetches requests, and performs write-back of dirty data to storage asynchronously. So each time the application needs the data it does not have to pay the full I/O latency penalty in going to the storage and getting the required data. We have implemented this caching and asynchronous access system on the Blue Gene (BG/L and BG/P) systems. We present experimental results with NAS BT, MADbench, and WRF benchmarks. The results on BG/P system demonstrate that our method hides access latency, enhances application I/O access time by as much as 100%, and improves WRF execution time over 10%.

[1]  Leonid Oliker,et al.  Integrated performance monitoring of a cosmology application on leading HEC platforms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[2]  Hao Yu,et al.  Early experiences in application level I/O tracing on blue gene systems , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[3]  G. Weirs,et al.  Validating the Flash Code: Vortex-Dominated Flows , 2004, astro-ph/0405410.

[4]  David E. Bernholdt,et al.  High performance computational chemistry: An overview of NWChem a distributed parallel application , 2000 .

[5]  Joel H. Saltz,et al.  Requirements of I/O systems for parallel machines: an application-driven study , 1997 .

[6]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[7]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[8]  Todd C. Mowry,et al.  Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.

[9]  Wei-keng Liao,et al.  DAChe: Direct Access Cache System for Parallel I/O , 2005 .

[10]  Marianne Winslett,et al.  Improving MPI-IO output performance with active buffering plus threads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  Hao Yu,et al.  Application level I/O caching on Blue Gene/P systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[12]  Leonid Oliker,et al.  Performance Characteristics of a Cosmology Package on Leading HPC Architectures , 2004, HiPC.

[13]  Mahmut T. Kandemir,et al.  Improving Locality in Out-of-Core Computations Using Data Layout Transformations , 1998, LCR.