Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures
暂无分享,去创建一个
[1] Jack J. Dongarra,et al. Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing , 2010, Parallel Comput..
[2] Enrique S. Quintana-Ortí,et al. Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures , 2009, PPAM.
[3] John A. Sharp,et al. Data flow computing: theory and practice , 1992 .
[4] B. Kågström,et al. Blocked algorithms for the reduction to Hessenberg-triangular form revisited , 2008 .
[5] Jack J. Dongarra,et al. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[6] Serge G. Petiton,et al. Workflow Global Computing with YML , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.
[7] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[8] Julien Langou,et al. The Impact of Multicore on Math Software , 2006, PARA.
[9] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[10] Christian H. Bischof,et al. Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.
[11] R. Dolbeau,et al. HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .
[12] Rajkumar Buyya,et al. A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.
[13] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[14] Jack Dongarra,et al. Parallel Block Hessenberg Reduction usingAlgorithms-By-Tiles for Multicore ArchitecturesRevisited , 2009 .
[15] Philipp Birken,et al. Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.
[16] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[17] Horst D. Simon,et al. The solution of large dense generalized eigenvalue problems on the Cray X-MP/24 with SSD , 1987 .
[18] Carl Kesselman,et al. Generalized communicators in the Message Passing Interface , 1996, Proceedings. Second MPI Developer's Conference.
[19] Jack J. Dongarra,et al. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008, IEEE Transactions on Parallel and Distributed Systems.
[20] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[21] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[22] Jack Dongarra,et al. QR Factorization for the CELL Processor , 2008 .
[23] Jack J. Dongarra,et al. Scheduling two-sided transformations using tile algorithms on multicore architectures , 2010, Sci. Program..
[24] Ken Kennedy,et al. Automatic blocking of QR and LU factorizations for locality , 2004, MSP '04.
[25] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[26] Fred G. Gustavson,et al. High Performance Computing with the Cell Broadband Engine , 2009, Sci. Program..
[27] Emmanuel Jeannot,et al. Automatic Parallelization Techniques Based on Compact DAG Extraction and Symbolic Scheduling , 2001, Parallel Process. Lett..
[28] R. Martin,et al. Electronic Structure: Basic Theory and Practical Methods , 2004 .
[29] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .
[30] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[31] DongarraJack,et al. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008 .
[32] T. Davis,et al. Algorithm 8 xx : PIRO BAND , Pipelined Plane Rotations for Blocked Band Reduction , 2009 .
[33] Alex Rapaport,et al. Mpi-2: extensions to the message-passing interface , 1997 .
[34] Jack Dongarra,et al. PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .
[35] Julien Langou,et al. Parallel tiled QR factorization for multicore architectures , 2007, Concurr. Comput. Pract. Exp..
[36] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[37] Notker Rösch,et al. ParaGauss: The Density Functional Program ParaGauss for Complex Systems in Chemistry , 2005 .
[38] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[39] Arndt Bode,et al. High Performance Computing in Science and Engineering, Garching 2004 , 2005 .
[40] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[41] Emmanuel Agullo,et al. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[42] Jaeyoung Choi,et al. Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..
[43] Robert A. van de Geijn,et al. Updating an LU Factorization with Pivoting , 2008, TOMS.