A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem
暂无分享,去创建一个
James Demmel | Torsten Hoefler | Grey Ballard | Edgar Solomonik | J. Demmel | T. Hoefler | Edgar Solomonik | Grey Ballard
[1] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[2] Jack J. Dongarra,et al. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[3] A. Tiskin. Bulk-Synchronous Parallel Gaussian Elimination , 2002 .
[4] Sartaj Sahni,et al. Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..
[5] Tze Meng Low,et al. Accumulating Householder transformations, revisited , 2006, TOMS.
[6] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[7] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[8] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[9] James Demmel,et al. Reconstructing Householder Vectors from Tall-Skinny QR , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[10] D. Sorensen,et al. Block reduction of matrices to condensed forms for eigenvalue computations , 1990 .
[11] James Demmel,et al. Avoiding Communication in Successive Band Reduction , 2015, ACM Trans. Parallel Comput..
[12] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[13] Lukas Krämer,et al. Developing algorithms and software for the parallel solution of the symmetric eigenvalue problem , 2011, J. Comput. Sci..
[14] James Demmel,et al. Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations , 2014, SPAA.
[15] G. Golub,et al. Parallel block schemes for large-scale least-squares computations , 1988 .
[16] D. Hartree. The Wave Mechanics of an Atom with a non-Coulomb Central Field. Part III. Term Values and Intensities in Series in Optical Spectra , 1928, Mathematical Proceedings of the Cambridge Philosophical Society.
[17] Thomas Auckenthaler,et al. Highly scalable eigensolvers for petaflop applications , 2012 .
[18] Jarle Berntsen,et al. Communication efficient matrix multiplication on hypercubes , 1989, Parallel Comput..
[19] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[20] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[21] P. Strazdins. A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization , 1998 .
[22] Robert A. van de Geijn,et al. Reduction to condensed form for the eigenvalue problem on distributed memory architectures , 1992, Parallel Comput..
[23] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[24] Inderjit S. Dhillon,et al. The design and implementation of the MRRR algorithm , 2006, TOMS.
[25] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[26] Rudnei Dias da Cunha,et al. New Parallel (Rank-Revealing) QR Factorization Algorithms , 2002, Euro-Par.
[27] Robert A. van de Geijn,et al. Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.
[28] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[29] S. Lennart Johnsson,et al. Minimizing the Communication Time for Matrix Multiplication on Multiprocessors , 1993, Parallel Comput..
[30] V. Fock,et al. Näherungsmethode zur Lösung des quantenmechanischen Mehrkörperproblems , 1930 .
[31] Bruno Lang,et al. A Parallel Algorithm for Reducing Symmetric Banded Matrices to Tridiagonal Form , 1993, SIAM J. Sci. Comput..
[32] Christian H. Bischof,et al. Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.
[33] Edgar Solomonik. Provably Efficient Algorithms for Numerical Tensor Algebra , 2014 .
[34] Alexander Tiskin. Communication-efficient parallel generic pairwise elimination , 2007, Future Gener. Comput. Syst..
[35] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[36] James Demmel,et al. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[37] Christian H. Bischof,et al. A framework for symmetric band reduction , 2000, TOMS.
[38] James Demmel,et al. Improving communication performance in dense linear algebra via topology aware collectives , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[39] D. Hartree. The Wave Mechanics of an Atom with a Non-Coulomb Central Field. Part I. Theory and Methods , 1928, Mathematical Proceedings of the Cambridge Philosophical Society.
[40] Emmanuel Jeannot,et al. Euro-Par 2011 Parallel Processing , 2011, Lecture Notes in Computer Science.
[41] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..