Communication-aware process and thread mapping using online communication detection
暂无分享,去创建一个
Philippe Olivier Alexandre Navaux | Matthias Diener | Eduardo Henrique Molina da Cruz | Anselm Busse | Hans-Ulrich Heiß
[1] Bernd Hamann,et al. Mapping applications with collectives over sub-communicators on torus networks , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Josep Torrellas. Architectures for Extreme-Scale Computing , 2009, Computer.
[3] Guillaume Mercier,et al. Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.
[4] William Gropp,et al. User's Guide for MPE: Extensions for MPI Programs , 1998 .
[5] Rob H. Bisseling,et al. Parallel hypergraph partitioning for scientific computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[6] Philippe Olivier Alexandre Navaux,et al. Communication-Based Mapping Using Shared Pages , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[7] Jesper Larsson Träff. Implementing the MPI process topology mechanism , 2002, SC '02.
[8] Philippe Olivier Alexandre Navaux,et al. Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[9] Brice Goglin,et al. KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework , 2013, J. Parallel Distributed Comput..
[10] William Gropp,et al. MPICH2: A New Start for MPI Implementations , 2002, PVM/MPI.
[11] Yurii A. Vlasov,et al. Technologies for exascale systems , 2011, IBM J. Res. Dev..
[12] Dong Li,et al. Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[13] Dhabaleswar K. Panda,et al. Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[15] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[16] Ian K. T. Tan,et al. Towards achieving fairness in the Linux scheduler , 2008, OPSR.
[17] Philippe Olivier Alexandre Navaux,et al. Multi-core aware process mapping and its impact on communication overhead of parallel applications , 2009, 2009 IEEE Symposium on Computers and Communications.
[18] Michael Stumm,et al. Enhancing operating system support for multicore processors by using hardware performance monitoring , 2009, OPSR.
[19] Samuel Thibault,et al. Structuring the execution of OpenMP applications for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[20] Jack Dongarra,et al. Introduction to the HPCChallenge Benchmark Suite , 2004 .
[21] Kenji Ono,et al. Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments , 2013 .
[22] Jeffrey M. Squyres,et al. Locality-Aware Parallel Process Mapping for Multi-core HPC Systems , 2011, 2011 IEEE International Conference on Cluster Computing.
[23] F. Pellegrini,et al. Static mapping by dual recursive bipartitioning of process architecture graphs , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[24] R. Pielke,et al. A comprehensive meteorological modeling system—RAMS , 1992 .
[25] Laxmikant V. Kale,et al. Automating Topology Aware Mapping for Supercomputers , 2010 .
[26] Emmanuel Jeannot,et al. Improving MPI Applications Performance on Multicore Clusters with Rank Reordering , 2011, EuroMPI.
[27] Emmanuel Jeannot,et al. Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques , 2014, IEEE Transactions on Parallel and Distributed Systems.
[28] Guillaume Mercier,et al. Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem , 2006, PVM/MPI.
[29] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[30] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[31] Andrew A. Chien,et al. The future of microprocessors , 2011, Commun. ACM.
[32] Michael Ott,et al. autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems , 2011, Trans. High Perform. Embed. Archit. Compil..
[33] Georg Hager,et al. Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[34] Frank Mueller,et al. Feedback-directed page placement for ccNUMA via hardware-generated memory traces , 2010, J. Parallel Distributed Comput..
[35] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[36] Laxmikant V. Kalé,et al. Topology-aware task mapping for reducing communication contention on large parallel machines , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[37] Naixue Xiong,et al. An approach for matching communication patterns in parallel applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[38] Torsten Hoefler,et al. Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.
[39] T. N. Vijaykumar,et al. Optimizing Replication, Communication, and Capacity Allocation in CMPs , 2005, ISCA 2005.
[40] Rolf Riesen,et al. Communication patterns , 2006 .
[41] Shahid H. Bokhari,et al. On the Mapping Problem , 1981, IEEE Transactions on Computers.
[42] B. Brandfass,et al. Rank reordering for MPI communication optimization , 2013 .
[43] Jonathan Green,et al. Multi-core and Network Aware MPI Topology Functions , 2011, EuroMPI.
[44] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[45] Rolf Riesen,et al. Communication patterns [message-passing patterns] , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[46] R. Vanderwijngaart,et al. NAS Parallel Benchmarks, Multi-Zone Versions , 2003 .
[47] Dhabaleswar K. Panda,et al. Design of network topology aware scheduling services for large InfiniBand clusters , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).
[48] Jack J. Dongarra,et al. EZTrace: A Generic Framework for Performance Analysis , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[49] Stephen L. Olivier,et al. Exploiting Geometric Partitioning in Task Mapping for Parallel Computers , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[50] Zizhong Chen,et al. Optimizing Process-to-Core Mappings for Application Level Multi-dimensional MPI Communications , 2012, 2012 IEEE International Conference on Cluster Computing.
[51] Michael Frumkin,et al. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance , 2013 .
[52] Nectarios Koziris,et al. Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[53] S. Freitas,et al. The Coupled Aerosol and Tracer Transport model to the Brazilian developments on the Regional Atmospheric Modeling System (CATT-BRAMS) – Part 1: Model description and evaluation , 2007 .
[54] Wenguang Chen,et al. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters , 2006, ICS '06.
[55] Wenguang Chen,et al. Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications , 2011, IEEE Transactions on Parallel and Distributed Systems.