HIDING I/O LATENCY WITH PARALLEL PRE-EXECUTION PREFETCHING

Parallel applications continue to suffer more from I/O latency as the rate of increase in computing power grows faster than that of memory and storage access performance. I/O prefetching is an effective solution to hide the latency, yet existing I/O prefetching techniques are conservative and their effectiveness is limited. A preexecution prefetching approach, whereby a thread dedicated to read operations is executed ahead of main thread in order to hide I/O latency, has been put forward to solve this “I/O wall” problem in a recent work. We first identify the limitation of applying the existing preexecution prefetching approach due to read after write (RAW) dependency, and then propose a method to overcome this limitation by assigning a thread for each dependent read operation. Preliminary experiments, including one from Hill encryption as a real-life application, verify the benefits of the proposed approach.

[1]  Rajeev Thakur,et al.  LACIO: A New Collective I/O Strategy for Parallel I/O Systems , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[2]  John May,et al.  Parallel I/O for High Performance Computing , 2000 .

[3]  Ming Wu,et al.  Scalability of heterogeneous computing , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[4]  Ingrid Verbauwhede,et al.  Elliptic curve cryptography on embedded multicore systems , 2008, Des. Autom. Embed. Syst..

[5]  Robert Latham,et al.  Scalable I/O forwarding framework for high-performance computing systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[6]  Evgenia Smirni,et al.  Workload Characterization of Input/Output Intensive Parallel Applications , 1997, Computer Performance Evaluation.

[7]  Onkar Sahni,et al.  Massively Parallel I/O for Partitioned Solver Systems , 2010, Parallel Process. Lett..

[8]  Xiaoning Ding,et al.  DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch , 2007, USENIX Annual Technical Conference.

[9]  Ada Gavrilovska,et al.  On disk I/O scheduling in virtual machines , 2010 .

[10]  Michael L. Scott,et al.  Aggressive Prefetching: An Idea Whose Time Has Come , 2005, HotOS.

[11]  Carla Schlatter Ellis,et al.  Prefetching in File Systems for MIMD Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[12]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[13]  Josep Torrellas,et al.  Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[14]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[16]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[17]  R. Ross,et al.  20 Parallel I / O and the Parallel Virtual File System , 2022 .

[18]  Seetharami Seelam,et al.  Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[19]  Mahmut T. Kandemir,et al.  A compiler-directed data prefetching scheme for chip multiprocessors , 2009, PPoPP '09.

[20]  Daniel A. Reed Scalable Input/Output: Achieving System Balance , 2003 .

[21]  Mousa Farajallah,et al.  Design of a Robust Cryptosystem Algorithm for Non-Invertible Matrices Based on Hill Cipher , 2009 .

[22]  Russel Hugo Patterson,et al.  Informed Prefetching and Caching (CMU-CS-97-204) , 1997 .

[23]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[24]  Jehan-François Pâris,et al.  Making Early Predictions of File Accesses , 2005 .

[25]  Surendra Byna,et al.  Hiding I/O latency with pre-execution prefetching for parallel applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  David Kotz,et al.  Dynamic file-access characteristics of a production parallel scientific workload , 1994, Proceedings of Supercomputing '94.

[27]  Todd C. Mowry,et al.  Compiler-based I/O prefetching for out-of-core applications , 2001, TOCS.

[28]  Daniel A. Reed,et al.  Learning to Classify Parallel Input/Output Access Patterns , 2002, IEEE Trans. Parallel Distributed Syst..