Lawrence Berkeley National Laboratory Recent Work Title Automatic translation of MPI source into a latency-tolerant , data-driven form Permalink
暂无分享,去创建一个
[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[2] Miss A.O. Penney. (b) , 1974, The New Yale Book of Quotations.
[3] Jack B. Dennis,et al. Data Flow Supercomputers , 1980, Computer.
[4] Jack Dongarra,et al. LINPACK Users' Guide , 1987 .
[5] William L. Briggs,et al. A multigrid tutorial , 1987 .
[6] Jack J. Dongarra,et al. The LINPACK Benchmark: An Explanation , 1988, ICS.
[7] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.
[8] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[9] Katherine A. Yelick,et al. Portable Runtime Support for Asynchronous Simulation , 1995, ICPP.
[10] Vipin Kumar,et al. Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..
[11] Arun K. Somani,et al. Minimizing overhead in parallel algorithms through overlapping communication/computation , 1997 .
[12] Rupak Biswas,et al. Communication Studies of DMP and SMP Machines , 1997 .
[13] Scott B. Baden,et al. Communication overlap in multi-tier parallel algorithms , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[14] Dale R. Shires,et al. Program Flow Graph Construction For Static Analysis of MPI Programs , 1999, PDPTA.
[15] Calvin Lin,et al. An annotation language for optimizing software libraries , 1999, DSL '99.
[16] Michael J. Holst,et al. A New Paradigm for Parallel Adaptive Meshing Algorithms , 2000, SIAM J. Sci. Comput..
[17] D. Marx. Ab initio molecular dynamics: Theory and Implementation , 2000 .
[18] Joseph E. Flaherty,et al. A hierarchical partition model for adaptive finite element computation , 2000 .
[19] P. Wesseling,et al. Geometric multigrid with applications to computational fluid dynamics , 2001 .
[20] Ken Kennedy,et al. KelpIO: a telescope-ready domain-specific I/O library for irregular block-structured applications , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.
[21] Joel H. Saltz,et al. Distributed processing of very large datasets with DataCutter , 2001, Parallel Comput..
[22] Keshav Pingali,et al. Date movement and control substrate for parallel adaptive applications , 2002, Concurr. Comput. Pract. Exp..
[23] Markus Schordan,et al. Treating a user-defined parallel library as a domain-specific language , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[24] Padma Raghavan,et al. A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[25] Padma Raghavan,et al. A latency tolerant hybrid sparse solver using incomplete Cholesky factorization , 2003, Numer. Linear Algebra Appl..
[26] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.
[27] Michael J. Quinn,et al. Parallel programming in C with MPI and OpenMP , 2003 .
[28] M. Clemens,et al. Geometric multigrid method for electro- and magnetostatic field simulations using the conformal finite integration technique , 2003 .
[29] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[30] Katherine A. Yelick,et al. Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[31] Ken Kennedy,et al. Telescoping Languages: A System for Automatic Generation of Domain Languages , 2005, Proceedings of the IEEE.
[32] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[33] Tiarajú Asmuz Diverio,et al. Automatic data-flow graph generation of MPI programs , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).
[34] D. Martin Swany,et al. Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[35] Aslak Tveito,et al. Numerical solution of partial differential equations on parallel computers , 2006 .
[36] Paul D. Hovland,et al. Data-Flow Analysis for MPI Programs , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[37] P. Colella,et al. A local corrections algorithm for solving Poisson’s equation in three dimensions , 2006 .
[38] Katherine A. Yelick,et al. Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[39] J. Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Vivek Sarkar,et al. Software challenges in extreme scale systems , 2009 .
[41] Scott B. Baden,et al. Hiding Communication Latency with Non-SPMD, Graph-Based Execution , 2009, ICCS.
[42] Eduard Ayguadé,et al. Overlapping communication and computation by using a hybrid MPI/SMPSs approach , 2010, ICS '10.
[43] Yifeng Chen,et al. Large-scale FFT on GPU clusters , 2010, ICS '10.
[44] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[45] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[46] Pietro Cicotti. Tarragon : a programming model for latency-hiding scientific computations , 2011 .
[47] Scott B. Baden,et al. Latency Hiding and Performance Tuning with Graph-Based Execution , 2011, 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing.
[48] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[49] Laxmikant V. Kalé,et al. Mapping Dense LU Factorization on Multicore Supercomputer Nodes , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[50] Erik H. D'Hollander,et al. Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August - 3 September 2011, Ghent, Belgium , 2012, PARCO.
[51] Scott B. Baden,et al. Bamboo -- Translating MPI applications to a latency-tolerant, data-driven form , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[52] S. Baden,et al. Bamboo-Preliminary scaling results on multiple hybrid nodes of Knights Corner and Sandy Bridge processors , 2013 .
[53] Vivek Sarkar,et al. Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.