论文信息 - HIDING I/O LATENCY WITH PARALLEL PRE-EXECUTION PREFETCHING

HIDING I/O LATENCY WITH PARALLEL PRE-EXECUTION PREFETCHING

Parallel applications continue to suffer more from I/O latency as the rate of increase in computing power grows faster than that of memory and storage access performance. I/O prefetching is an effective solution to hide the latency, yet existing I/O prefetching techniques are conservative and their effectiveness is limited. A preexecution prefetching approach, whereby a thread dedicated to read operations is executed ahead of main thread in order to hide I/O latency, has been put forward to solve this “I/O wall” problem in a recent work. We first identify the limitation of applying the existing preexecution prefetching approach due to read after write (RAW) dependency, and then propose a method to overcome this limitation by assigning a thread for each dependent read operation. Preliminary experiments, including one from Hill encryption as a real-life application, verify the benefits of the proposed approach.

Yue Zhao | Kenji Yoshigoe | K. Yoshigoe | Yue Zhao

[1] Rajeev Thakur,et al. LACIO: A New Collective I/O Strategy for Parallel I/O Systems , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[2] John May,et al. Parallel I/O for High Performance Computing , 2000 .

[3] Ming Wu,et al. Scalability of heterogeneous computing , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[4] Ingrid Verbauwhede,et al. Elliptic curve cryptography on embedded multicore systems , 2008, Des. Autom. Embed. Syst..

[5] Robert Latham,et al. Scalable I/O forwarding framework for high-performance computing systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[6] Evgenia Smirni,et al. Workload Characterization of Input/Output Intensive Parallel Applications , 1997, Computer Performance Evaluation.

[7] Onkar Sahni,et al. Massively Parallel I/O for Partitioned Solver Systems , 2010, Parallel Process. Lett..

[8] Xiaoning Ding,et al. DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch , 2007, USENIX Annual Technical Conference.

[9] Ada Gavrilovska,et al. On disk I/O scheduling in virtual machines , 2010 .

[10] Michael L. Scott,et al. Aggressive Prefetching: An Idea Whose Time Has Come , 2005, HotOS.

[11] Carla Schlatter Ellis,et al. Prefetching in File Systems for MIMD Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[12] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[13] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[14] Surendra Byna,et al. Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[15] Rajeev Thakur,et al. Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[16] Jim Zelenka,et al. Informed prefetching and caching , 1995, SOSP.

[17] R. Ross,et al. 20 Parallel I / O and the Parallel Virtual File System , 2022 .

[18] Seetharami Seelam,et al. Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[19] Mahmut T. Kandemir,et al. A compiler-directed data prefetching scheme for chip multiprocessors , 2009, PPoPP '09.

[20] Daniel A. Reed. Scalable Input/Output: Achieving System Balance , 2003 .

[21] Mousa Farajallah,et al. Design of a Robust Cryptosystem Algorithm for Non-Invertible Matrices Based on Hill Cipher , 2009 .

[22] Russel Hugo Patterson,et al. Informed Prefetching and Caching (CMU-CS-97-204) , 1997 .

[23] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[24] Jehan-François Pâris,et al. Making Early Predictions of File Accesses , 2005 .

[25] Surendra Byna,et al. Hiding I/O latency with pre-execution prefetching for parallel applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[26] David Kotz,et al. Dynamic file-access characteristics of a production parallel scientific workload , 1994, Proceedings of Supercomputing '94.

[27] Todd C. Mowry,et al. Compiler-based I/O prefetching for out-of-core applications , 2001, TOCS.

[28] Daniel A. Reed,et al. Learning to Classify Parallel Input/Output Access Patterns , 2002, IEEE Trans. Parallel Distributed Syst..