Optimizing I/O Performance of HPC Applications with Autotuning
暂无分享,去创建一个
Surendra Byna | Prabhat | Marc Snir | Babak Behzad | M. Snir | S. Byna | Babak Behzad
[1] A. Adelmann,et al. Progress on H5Part: a portable high performance parallel data interface for electromagnetics simulations , 2007, 2007 IEEE Particle Accelerator Conference (PAC).
[2] Arif Merchant,et al. Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.
[3] Akio Arakawa,et al. CLOUDS AND CLIMATE: A PROBLEM THAT REFUSES TO DIE. Clouds of many , 2022 .
[4] Kalyanmoy Deb,et al. A Computationally Efficient Evolutionary Algorithm for Real-Parameter Optimization , 2002, Evolutionary Computation.
[5] Harvey Richardson,et al. High Performance Fortran: history, overview and current developments , 1996 .
[6] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[7] Rajeev Thakur,et al. Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.
[8] Andrew A. Chien,et al. Performance Modeling of a Parallel I/O System: An Application Driven Approach , 1997, PPSC.
[9] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[10] Robert Latham,et al. I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[11] Avishek Saha,et al. Characterization and modeling of PIDX parallel I/O for performance optimization , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[12] Alden H. Wright,et al. Genetic Algorithms for Real Parameter Optimization , 1990, FOGA.
[13] Marianne Winslett,et al. Automatic parallel I/O performance optimization using genetic algorithms , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).
[14] Weizhe Zhang,et al. Automatic Generation of I/O Kernels for HPC Applications , 2014, 2014 9th Parallel Data Storage Workshop.
[15] Surendra Byna,et al. Improving parallel I/O autotuning with performance modeling , 2014, HPDC '14.
[16] Arie Shoshani,et al. Parallel I/O, analysis, and visualization of a trillion particle simulation , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Wei-keng Liao,et al. Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Qing Liu,et al. The Design of an Auto-Tuning I / O Framework on Cray XT 5 System , 2011 .
[19] Thomas Bäck,et al. An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.
[20] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[21] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[22] Daniel A. Reed,et al. A Comparison of Logical and Physical Parallel I/o pAtterns , 1998, Int. J. High Perform. Comput. Appl..
[23] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[24] Dror G. Feitelson,et al. Overview of the MPI-IO Parallel I/O Interface , 1996, Input/Output in Parallel and Distributed Computer Systems.
[25] Samuel Williams,et al. PERI - auto-tuning memory-intensive kernels for multicore , 2008 .
[26] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Marianne Winslett,et al. A multi-level approach for understanding I/O activity in HPC applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).
[28] Evgenia Smirni,et al. Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications , 1998, Perform. Evaluation.
[29] Michael A. Laurenzano,et al. Modeling and Predicting Disk I/O Time of HPC Applications , 2010, 2010 DoD High Performance Computing Modernization Program Users Group Conference.
[30] Andrew J. Hutton,et al. Lustre: Building a File System for 1,000-node Clusters , 2003 .
[31] Carlos Maltzahn,et al. I/O acceleration with pattern detection , 2013, HPDC.
[32] Ananta Tiwari,et al. Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[33] K. Bowers,et al. Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulationa) , 2008 .
[34] Ray W. Grout,et al. Skel: Generative Software for Producing Skeletal I/O Applications , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.
[35] Houjun Tang,et al. Parallel In Situ Detection of Connected Components in Adaptive Mesh Refinement Data , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[36] Marianne Winslett,et al. Automatic parallel I/O performance optimization in Panda , 1998, SPAA '98.
[37] John Shalf,et al. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[38] Eric Anderson,et al. Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .
[39] Surendra Byna,et al. Taming parallel I/O complexity with auto-tuning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[40] Francine Berman,et al. Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).
[41] J. Cary,et al. VORPAL: a versatile plasma simulation code , 2004 .
[42] Robert B. Ross,et al. Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[43] Thomas Fahringer,et al. A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[44] Marianne Winslett,et al. Performance Modeling for the Panda Array I/O Library , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[45] Surendra Byna,et al. Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[46] Christos Faloutsos,et al. Using Utility to Provision Storage Systems , 2008, FAST.
[47] Alok N. Choudhary,et al. Improved parallel I/O via a two-phase run-time access strategy , 1993, CARN.
[48] Xian-He Sun,et al. Cost-intelligent application-specific data layout optimization for parallel file systems , 2013, Cluster Computing.
[49] Jack J. Dongarra,et al. A comparison of search heuristics for empirical code optimization , 2008, 2008 IEEE International Conference on Cluster Computing.