Automatic translation of MPI source into a latency-tolerant, data-driven form
暂无分享,去创建一个
Scott B. Baden | Eric J. Bylaska | Tan Nguyen | Dan Quinlan | Pietro Cicotti | D. Quinlan | E. Bylaska | Pietro Cicotti | S. Baden | T. Nguyen
[1] M. Clemens,et al. Geometric multigrid method for electro- and magnetostatic field simulations using the conformal finite integration technique , 2003 .
[2] Robert A. van de Geijn,et al. Managing the complexity of lookahead for LU factorization with pivoting , 2010, SPAA '10.
[3] D. Marx. Ab initio molecular dynamics: Theory and Implementation , 2000 .
[4] Eric J. Bylaska,et al. Large‐Scale Plane‐Wave‐Based Density Functional Theory: Formalism, Parallelization, and Applications , 2011 .
[5] Dale R. Shires,et al. Program Flow Graph Construction For Static Analysis of MPI Programs , 1999, PDPTA.
[6] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[7] Scott B. Baden,et al. Communication overlap in multi-tier parallel algorithms , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[8] Wu-chun Feng,et al. On the efficacy of GPU-integrated MPI for scientific applications , 2013, HPDC '13.
[9] William Gropp,et al. The MPI Message-Passing Interface Standard: Overview and Status , 1995 .
[10] Vivek Sarkar,et al. Software challenges in extreme scale systems , 2009 .
[11] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[12] Ken Kennedy,et al. Telescoping Languages: A System for Automatic Generation of Domain Languages , 2005, Proceedings of the IEEE.
[13] Wu-chun Feng,et al. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.
[14] Aslak Tveito,et al. Numerical solution of partial differential equations on parallel computers , 2006 .
[15] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[16] Barry Wilkinson,et al. Parallel programming , 1998 .
[17] Markus Schordan,et al. Treating a user-defined parallel library as a domain-specific language , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[18] Vivek Sarkar,et al. Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[19] Padma Raghavan,et al. A latency tolerant hybrid sparse solver using incomplete Cholesky factorization , 2003, Numer. Linear Algebra Appl..
[20] Katherine A. Yelick,et al. Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[21] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[22] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[23] Jack B. Dennis,et al. Data Flow Supercomputers , 1980, Computer.
[24] Scott B. Baden,et al. Latency Hiding and Performance Tuning with Graph-Based Execution , 2011, 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing.
[25] Katherine A. Yelick,et al. Portable Runtime Support for Asynchronous Simulation , 1995, ICPP.
[26] Joseph E. Flaherty,et al. A hierarchical partition model for adaptive finite element computation , 2000 .
[27] Keshav Pingali,et al. Date movement and control substrate for parallel adaptive applications , 2002, Concurr. Comput. Pract. Exp..
[28] T. von Eicken,et al. Parallel programming in Split-C , 1993, Supercomputing '93.
[29] Pietro Cicotti. Tarragon : a programming model for latency-hiding scientific computations , 2011 .
[30] Padma Raghavan,et al. A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[31] William L. Briggs,et al. A multigrid tutorial , 1987 .
[32] Eduard Ayguadé,et al. Overlapping communication and computation by using a hybrid MPI/SMPSs approach , 2010, ICS '10.
[33] Scott B. Baden,et al. Bamboo -- Translating MPI applications to a latency-tolerant, data-driven form , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[35] Scott B. Baden,et al. Hiding Communication Latency with Non-SPMD, Graph-Based Execution , 2009, ICCS.
[36] Michael J. Holst,et al. A New Paradigm for Parallel Adaptive Meshing Algorithms , 2000, SIAM J. Sci. Comput..
[37] Paul D. Hovland,et al. Data-Flow Analysis for MPI Programs , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[38] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[39] Jack J. Dongarra,et al. The LINPACK Benchmark: An Explanation , 1988, ICS.
[40] P. Wesseling,et al. Geometric multigrid with applications to computational fluid dynamics , 2001 .
[41] Joel H. Saltz,et al. Distributed processing of very large datasets with DataCutter , 2001, Parallel Comput..
[42] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[43] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[44] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[45] Erik H. D'Hollander,et al. Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August - 3 September 2011, Ghent, Belgium , 2012, PARCO.
[46] Martin Schulz,et al. Using MPI Communication Patterns to Guide Source Code Transformations , 2008, ICCS.
[47] Scott B. Baden,et al. Asynchronous programming with Tarragon , 2006, SC.
[48] D. Martin Swany,et al. Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[49] Tiarajú Asmuz Diverio,et al. Automatic data-flow graph generation of MPI programs , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).
[50] Ken Kennedy,et al. KELPIO a telescope-ready domain-specific I/O library for irregular block-structured applications , 2002, Future Gener. Comput. Syst..
[51] Laxmikant V. Kalé,et al. Mapping Dense LU Factorization on Multicore Supercomputer Nodes , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[52] Michael J. Quinn,et al. Parallel programming in C with MPI and OpenMP , 2003 .
[53] Calvin Lin,et al. An annotation language for optimizing software libraries , 1999, DSL '99.
[54] Katherine A. Yelick,et al. Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[55] Aslak Tveito,et al. Numerical Solution of Partial Differential Equations on Parallel Computers (Lecture Notes in Computational Science and Engineering) , 2006 .
[56] Rupak Biswas,et al. Communication Studies of DMP and SMP Machines , 1997 .
[57] Vipin Kumar,et al. Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..
[58] P. Colella,et al. A local corrections algorithm for solving Poisson’s equation in three dimensions , 2006 .
[59] Arun K. Somani,et al. Minimizing overhead in parallel algorithms through overlapping communication/computation , 1997 .
[60] Yifeng Chen,et al. Large-scale FFT on GPU clusters , 2010, ICS '10.
[61] Scott B. Baden,et al. LU Factorization: Towards Hiding Communication Overheads with a Lookahead-Free Algorithm , 2015, 2015 IEEE International Conference on Cluster Computing.
[62] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.