Memory Bandwidth Contention: Communication vs Computation Tradeoffs in Supercomputers with Multicore Architectures
暂无分享,去创建一个
[1] Scott B. Baden,et al. Scalable Heterogeneous CPU-GPU Computations for Unstructured Tetrahedral Meshes , 2015, IEEE Micro.
[2] Nan Wu,et al. Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes , 2015, J. Parallel Distributed Comput..
[3] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[5] Dhabaleswar K. Panda,et al. Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.
[6] Scott B. Baden,et al. Overlapping communication and computation with OpenMP and MPI , 2001, Sci. Program..
[7] Daniel A. Menascé,et al. Predicting the Effect of Memory Contention in Multi-Core Computers Using Analytic Performance Models , 2015, IEEE Transactions on Computers.
[8] Brice Goglin,et al. A benchmark-based performance model for memory-bound HPC applications , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).
[9] Odysseas I. Pentakalos. An Introduction to the InfiniBand Architecture , 2002, Int. CMG Conference.
[10] Scott B. Baden,et al. Panda: A Compiler Framework for Concurrent CPU+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document}GPU Ex , 2016, International Journal of Parallel Programming.
[11] Xing Cai,et al. Heterogeneous CPU-GPU computing for the finite volume method on 3D unstructured meshes , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).
[12] Jun Zhou,et al. Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers , 2013, ICCS.
[13] Xingfu Wu,et al. Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers , 2011, J. Comput. Syst. Sci..
[14] Samuel Williams,et al. Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor , 2016, ISC Workshops.
[15] Richard W. Vuduc,et al. A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method , 2014, GPGPU@ASPLOS.
[16] C. Aykanat. Hypergraph Model for Mapping Repeated Sparse-Matrix Vector Product Computations onto Multicomputers , 1995 .
[17] David A. Bader,et al. A Methodology for Co-Location Aware Application Performance Modeling in Multicore Computing , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[18] Pavan Balaji,et al. Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.