A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems
暂无分享,去创建一个
[1] Henri Casanova,et al. Speed and accuracy of network simulation in the SimGrid framework , 2007, ValueTools '07.
[2] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[3] Emmanuel Jeannot,et al. Communication and topology-aware load balancing in Charm++ with TreeMatch , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).
[4] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[5] Naixue Xiong,et al. An approach for matching communication patterns in parallel applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[6] Laxmikant V. Kalé,et al. Dynamic topology aware load balancing algorithms for molecular dynamics applications , 2009, ICS.
[7] George Bosilca,et al. Online Dynamic Monitoring of MPI Communications , 2017, Euro-Par.
[8] Laxmikant V. Kalé,et al. A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems , 2012, 2012 41st International Conference on Parallel Processing.
[9] Robert Schöne,et al. Main memory and cache performance of intel sandy bridge and AMD bulldozer , 2014, MSPC@PLDI.
[10] Ahmad Faraj,et al. Communication Characteristics in the NAS Parallel Benchmarks , 2002, IASTED PDCS.
[11] John M. Mellor-Crummey,et al. A tool to analyze the performance of multithreaded programs on NUMA architectures , 2014, PPoPP '14.
[12] I. Lee,et al. Characterizing communication patterns of NAS-MPI benchmark programs , 2009, IEEE Southeastcon 2009.
[13] Wenguang Chen,et al. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters , 2006, ICS '06.
[14] Charles Elkan,et al. Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.
[15] Franck Cappello,et al. MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[16] Robert J. Safranek,et al. Intel® QuickPath Interconnect Architectural Features Supporting Scalable System Architectures , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[17] Philippe Olivier Alexandre Navaux,et al. Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[18] Thomas Hérault,et al. Process Distance-Aware Adaptive MPI Collective Communications , 2011, 2011 IEEE International Conference on Cluster Computing.
[19] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[20] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[21] Jin Zhang,et al. Process Mapping for MPI Collective Communications , 2009, Euro-Par.
[22] Arnaud Legrand,et al. Simulating MPI Applications: The SMPI Approach , 2017, IEEE Transactions on Parallel and Distributed Systems.
[23] Emmanuel Jeannot,et al. Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques , 2014, IEEE Transactions on Parallel and Distributed Systems.
[24] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.