Toward an evolutionary task parallel integrated MPI + X programming model
暂无分享,去创建一个
Stephen L. Olivier | Courtenay T. Vaughan | Richard F. Barrett | Kevin T. Pedretti | Ryan E. Grant | Stephen Olivier | Dylan T. Stark | R. Barrett | K. Pedretti | C. Vaughan | Dylan T. Stark
[1] D. Roweth,et al. Cray XC ® Series Network , 2012 .
[2] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[3] Ahmad Afsahi,et al. A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects , 2009, International Journal of Parallel Programming.
[4] Courtenay T. Vaughan,et al. Using the Cray Gemini Performance Counters. , 2013 .
[5] Eduard Ayguadé,et al. Overlapping communication and computation by using a hybrid MPI/SMPSs approach , 2010, ICS '10.
[6] Brian Vinter,et al. Using overdecomposition to overlap communication latencies with computation and take advantage of SMT processors , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).
[7] Carl Edward Oliver,et al. Scientific Discovery through Advanced Computing , 2001, International Conference on Computational Science.
[8] Keith D. Underwood,et al. SeaStar Interconnect: Balanced Bandwidth for Scalable Performance , 2006, IEEE Micro.
[9] Mahesh Rajan,et al. Application-Driven Acceptance of Cielo an XE6 Petascale Capability Platform. , 2011 .
[10] Douglas Doerfler,et al. Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces , 2006, PVM/MPI.
[11] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[12] Sandia Report,et al. MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing , 2012 .
[13] Stephen W. Poole,et al. Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[14] Richard F. Barrett,et al. A Taxonomy of MPI-Oriented Usage Models in Parallelized Scientific Codes , 2009, Software Engineering Research and Practice.
[15] Courtenay T. Vaughan,et al. Reducing the Bulk in the Bulk Synchronous Parallel Model , 2013, Parallel Process. Lett..
[16] Mauricio Araya-Polo,et al. Towards a Multi-Level Cache Performance Model for 3D Stencil Computation , 2011, ICCS.
[17] Douglas Thain,et al. Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[18] Emmanuel Agullo,et al. Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[19] William J. Dally,et al. Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.
[20] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[21] C. T. Vaughan,et al. Assessing the role of mini-applications in predicting key performance characteristics of scientific and engineering applications , 2015, J. Parallel Distributed Comput..
[22] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[23] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[24] Qingyu Meng,et al. Scalable large‐scale fluid–structure interaction solvers in the Uintah framework via hybrid task‐based parallelism algorithms , 2014, Concurr. Comput. Pract. Exp..
[25] Vivek Sarkar,et al. Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[26] Apan Qasem,et al. Understanding stencil code performance on multicore architectures , 2011, CF '11.
[27] Graph Topology. MPI at Exascale , 2010 .
[28] A Thesis,et al. Tiling Stencil Computations to Maximize Parallelism , 2013 .
[29] Stephen L. Olivier,et al. Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications , 2014, 2014 Workshop on Exascale MPI at Supercomputing Conference.