DAGuE: A Generic Distributed DAG Engine for High Performance Computing
暂无分享,去创建一个
Thomas Hérault | George Bosilca | Jack J. Dongarra | Anthony Danalis | Aurelien Bouteiller | Pierre Lemarinier
[1] Lars Karlsson,et al. Distributed SBP Cholesky factorization algorithms with near-optimal scheduling , 2009, TOMS.
[2] Serge G. Petiton,et al. Workflow Global Computing with YML , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.
[3] Emmanuel Jeannot,et al. AUTOMATIC MULTITHREADED PARALLEL PROGRAM GENERATION FOR MESSAGE PASSING MULTIPROCESSORS USING PARAMETERIZED TASK GRAPHS , 2002 .
[4] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[5] Jack Dongarra,et al. Parallel tiled QR factorization for multicore architectures , 2008 .
[6] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .
[7] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[8] James Demmel,et al. ScaLAPACK: A Linear Algebra Library for Message-Passing Computers , 1997, PPSC.
[9] Emmanuel Jeannot,et al. Automatic Parallelization Techniques Based on Compact DAG Extraction and Symbolic Scheduling , 2001, Parallel Process. Lett..
[10] John A. Gunnels,et al. Minimal Data Copy for Dense Linear Algebra Factorization , 2006, PARA.
[11] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[12] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[13] Katherine A. Yelick,et al. Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[14] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[15] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[16] Robert A. van de Geijn,et al. Updating an LU Factorization with Pivoting , 2008, TOMS.
[17] Robert A. van de Geijn,et al. SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks , 2008, PPoPP.
[18] Julien Langou,et al. The Impact of Multicore on Math Software , 2006, PARA.
[19] Rajkumar Buyya,et al. A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.
[20] Emmanuel Jeannot,et al. Compact DAG representation and its symbolic scheduling , 1999, J. Parallel Distributed Comput..
[21] Arthur J. Bernstein,et al. Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..
[22] G. W. Stewart,et al. Matrix algorithms , 1998 .
[23] Peter J. Denning,et al. Operating Systems Theory , 1973 .
[24] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[25] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[26] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[27] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[28] Jack J. Dongarra,et al. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[29] Franck Cappello,et al. Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..
[30] John A. Sharp,et al. Data flow computing: theory and practice , 1992 .
[31] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[32] George Bosilca,et al. Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project , 2010 .