Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
暂无分享,去创建一个
David E. Keyes | Hatem Ltaief | George M. Turkiyyah | Wajih Halim Boukaram | D. Keyes | G. Turkiyyah | H. Ltaief | W. Boukaram
[1] Jack J. Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..
[2] Jacob Barhen,et al. Singular value decomposition utilizing parallel algorithms on graphical processors , 2011, OCEANS'11 MTS/IEEE KONA.
[3] Max Grossman,et al. Professional CUDA C Programming , 2014 .
[4] Gabriel Oksa,et al. Efficient pre-processing in the parallel block-Jacobi SVD algorithm , 2006, Parallel Comput..
[5] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[6] James Demmel,et al. Communication-Avoiding QR Decomposition for GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[7] HackbuschW.. A sparse matrix arithmetic based on H-matrices. Part I , 1999 .
[8] Richard P. Brent,et al. A Parallel Ring Ordering Algorithm for Efficient One-Sided Jacobi SVD Computations , 1997, J. Parallel Distributed Comput..
[9] James Demmel,et al. Jacobi's Method is More Accurate than QR , 1989, SIAM J. Matrix Anal. Appl..
[10] Wolfgang Hackbusch,et al. A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.
[11] Marián Vajtersic,et al. Block-jacobi Svd Algorithms for Distributed Memory Systems Ii: Meshes* , 1999, Parallel Algorithms Appl..
[12] Marián Vajtersic,et al. Block-jacobi Svd Algorithms for Distributed Memory Systems I: Hypercubes and Rings , 1999, Parallel Algorithms Appl..
[13] Nicholas Wilt,et al. The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .
[14] Philipp Birken,et al. Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.
[15] Hatem Ltaief,et al. Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs , 2019, ACM Trans. Math. Softw..
[16] Richard P. Brent,et al. On parallel implementation of the one-sided Jacobi algorithm for singular value decompositions , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.
[17] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..
[18] Jack J. Dongarra,et al. Optimization for performance and energy for batched matrix computations on GPUs , 2015, GPGPU@PPoPP.
[19] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[20] Jack J. Dongarra,et al. A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations , 2015, ISC.
[21] Steffen Börm,et al. Approximating Gaussian Processes with H2-Matrices , 2007, ECML.
[22] W. Hackbusch,et al. Hierarchical Matrices: Algorithms and Analysis , 2015 .
[23] Gene H. Golub,et al. Matrix computations , 1983 .
[24] Martin Bečka,et al. New Dynamic Orderings for the Parallel One-Sided Block-Jacobi SVD Algorithm , 2015, Parallel Process. Lett..
[25] Che-Rung Lee,et al. Improving Performance of Convolutional Neural Networks by Separable Filters on GPU , 2015, Euro-Par.
[26] W. Hackbusch,et al. On H2-Matrices , 2000 .
[27] Boris N. Khoromskij,et al. A Sparse H-Matrix Arithmetic. Part II: Application to Multi-Dimensional Problems , 2000, Computing.
[28] Luciano de Paula,et al. Many SVDs on GPU for Image Mosaic Assemble , 2015, 2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW).
[29] Wolfgang Hackbusch,et al. Construction and Arithmetics of H-Matrices , 2003, Computing.