Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs
暂无分享,去创建一个
[1] Gerhard Wellein,et al. Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs , 2018, ISC.
[2] Katherine E. Isaacs,et al. There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[3] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[4] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[5] Markus Geimer,et al. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications , 2010, 2010 39th International Conference on Parallel Processing.
[6] George Michelogiannakis,et al. The Pitfalls of Provisioning Exascale Networks: A Trace Replay Analysis for Understanding Communication Performance , 2018, ISC.
[7] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[8] Georg Hager,et al. On the accuracy and usefulness of analytic energy models for contemporary multicore processors , 2018, ISC.
[9] Gerhard Wellein,et al. Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors , 2020, Supercomput. Front. Innov..
[10] Gerhard Wellein,et al. High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations , 2015, J. Comput. Phys..
[11] David W. Walker,et al. Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters , 2010, J. Comput. Sci..
[12] Gerhard Wellein,et al. Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model , 2014, ICS.
[13] Xingfu Wu,et al. Using Processor Partitioning to Evaluate the Performance of MPI, OpenMP and Hybrid Parallel Applications on Dual- and Quad-core Cray XT4 Systems , 2009 .
[14] Manish Parashar,et al. Local recovery and failure masking for stencil-based applications at extreme scales , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Erwin Laure,et al. Idle waves in high-performance computing. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.
[16] Gerhard Wellein,et al. Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study , 2019, 2019 IEEE International Conference on Cluster Computing (CLUSTER).
[17] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[18] A. Lumsdaine,et al. LogGOPSim: simulating large-scale applications in the LogGOPS model , 2010, HPDC '10.
[19] Gerhard Wellein,et al. Delay Flow Mechanisms on Clusters , 2019 .
[20] Adam Moody,et al. System Noise Revisited: Enabling Application Scalability and Reproducibility with SMT , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[21] Yutaka Ishikawa,et al. Hardware Performance Variation: A Comparative Study Using Lightweight Kernels , 2018, ISC.
[22] Gerhard Wellein,et al. Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems , 2014, 2015 IEEE International Parallel and Distributed Processing Symposium.