Multi-Level Load Balancing with an Integrated Runtime Approach
暂无分享,去创建一个
Laxmikant V. Kalé | Matthias Diener | Harshitha Menon | Sam White | Seonmyeong Bak | L. Kalé | Harshitha Menon | M. Diener | Seonmyeong Bak | Sam White
[1] William Gropp,et al. Weighted locality-sensitive scheduling for mitigating noise on multi-core clusters , 2011, 2011 18th International Conference on High Performance Computing.
[2] William Gropp,et al. Locality-Optimized Mixed Static/Dynamic Scheduling for Improving Load Balancing on SMPs , 2014, EuroMPI/ASIA.
[3] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[4] Tao Yang,et al. Program transformation and runtime support for threaded MPI execution on shared-memory machines , 2000, TOPL.
[5] Torsten Hoefler,et al. Hybrid MPI: Efficient message passing for multi-core systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[6] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[7] Suzanne M. Kelly,et al. Software Architecture of the Light Weight Kernel, Catamount , 2005 .
[8] Tao Yang,et al. Adaptive Two-level Thread Management for Fast MPI Execution on Shared Memory Machines , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[9] Sameer Kumar,et al. Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l , 2008, ICS '08.
[10] Patrick Carribault,et al. MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption , 2009, PVM/MPI.
[11] Rajeev Thakur,et al. Hybrid parallel programming with MPI and unified parallel C , 2010, Conf. Computing Frontiers.
[12] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[13] Erik D. Demaine,et al. A Threads-Only MPI Implementation for the Development of Parallel Programs , 1997 .
[14] Laxmikant V. Kalé,et al. Variation Among Processors Under Turbo Boost in HPC Systems , 2016, ICS.
[15] Chao Mei,et al. Message-driven parallel language runtime design and optimizations for multicore-based massively parallel machines , 2012 .
[16] Laxmikant V. Kalé,et al. Integrating OpenMP into the Charm++ Programming Model , 2017, ESPM2@SC.
[17] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[18] Nancy M. Amato,et al. Quantifying the effectiveness of load balance algorithms , 2012, ICS '12.
[19] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[20] Eduard Ayguadé,et al. Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[21] Abhishek Gupta,et al. Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Yi Guo,et al. The habanero multicore software research project , 2009, OOPSLA Companion.
[23] Raymond Namyst,et al. MPC: A Unified Parallel Runtime for Clusters of NUMA Machines , 2008, Euro-Par.
[24] Mark Bull,et al. Development of mixed mode MPI / OpenMP applications , 2001, Sci. Program..
[25] Peter N. Brown,et al. KRIPKE - A MASSIVELY PARALLEL TRANSPORT MINI-APP , 2015 .
[26] Patrick Carribault,et al. Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC , 2010, IWOMP.
[27] William Gropp,et al. Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[28] Chuck Pheatt,et al. Intel® threading building blocks , 2008 .
[29] Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory , 2016 .
[30] Torsten Hoefler,et al. Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming , 2012, EuroMPI.
[31] Laura Grigori,et al. Lightweight Scheduling for Balancing the Tradeoff Between Load Balance and Locality , 2014 .
[32] Torsten Hoefler,et al. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] Alejandro Duran,et al. Productive Cluster Programming with OmpSs , 2011, Euro-Par.
[34] E. Tollerud,et al. The Sagittarius impact as an architect of spirality and outer rings in the Milky Way , 2011, Nature.
[35] Alejandro Duran,et al. Dynamic load balancing of MPI+OpenMP applications , 2004 .
[36] Torsten Hoefler,et al. MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory , 2013, Computing.
[37] Scott Pakin,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.
[38] Tucson,et al. The AGORA High-resolution Galaxy Simulations Comparison Project. III. Cosmological Zoom-in Simulation of a Milky Way–mass Halo , 2013, The Astrophysical Journal.
[39] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.