Exploiting communication concurrency on high performance computing systems
暂无分享,去创建一个
[1] Katherine A. Yelick,et al. On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domains , 2014, ICS '14.
[2] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[3] Torsten Hoefler,et al. Mpi on Millions of Cores * , 2022 .
[4] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[5] Rajeev Thakur,et al. Enabling MPI interoperability through flexible communication endpoints , 2013, EuroMPI.
[6] Rajeev Thakur,et al. Test suite for evaluating performance of multithreaded MPI communication , 2009, Parallel Comput..
[7] Rajeev Thakur,et al. Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems , 2010, EuroMPI.
[8] Yuanyuan Yang,et al. Efficient all-to-all broadcast in all-port mesh and torus networks , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[9] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[10] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[11] Dhabaleswar K. Panda,et al. Congestion avoidance on manycore high performance computing systems , 2012, ICS '12.
[12] Bronis R. de Supinski,et al. Minimizing MPI Resource Contention in Multithreaded Multicore Environments , 2010, 2010 IEEE International Conference on Cluster Computing.
[13] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[15] Leonid Oliker,et al. Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark , 2012 .
[16] Dhabaleswar K. Panda,et al. Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems , 2014, PPoPP '14.
[17] Laxmikant V. Kalé,et al. Scaling all-to-all multicast on fat-tree networks , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..
[18] Franck Cappello,et al. MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[19] Bryan Carpenter,et al. ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.
[20] Pavan Balaji,et al. MT-MPI: multithreaded MPI for many-core environments , 2014, ICS '14.
[21] Samuel Williams,et al. Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[22] Vivek Sarkar,et al. Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[23] D Bonachea,et al. UPC Language and Library Specifications, Version 1.3 , 2013 .
[24] Jack Dongarra,et al. Special Issue on Program Generation, Optimization, and Platform Adaptation , 2005, Proc. IEEE.
[25] Dan Bonachea. GASNet Specification, v1.1 , 2002 .
[26] Katherine A. Yelick,et al. Hybrid PGAS runtime support for multicore nodes , 2010, PGAS '10.
[27] Samuel Williams,et al. Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms , 2011, Parallel Comput..
[28] Katherine A. Yelick,et al. An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[29] Samuel Williams,et al. Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[30] Yuanyuan Yang,et al. Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.