Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System
暂无分享,去创建一个
[1] Yves Robert,et al. Tiled QR factorization algorithms , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[2] James Demmel,et al. Communication-Avoiding QR Decomposition for GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[3] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[4] Alston S. Householder,et al. Unitary Triangularization of a Nonsymmetric Matrix , 1958, JACM.
[5] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[6] Robert A. van de Geijn,et al. Retargeting PLAPACK to clusters with hardware accelerators , 2010, 2010 International Conference on High Performance Computing & Simulation.
[7] Jack J. Dongarra,et al. Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.
[8] Jack J. Dongarra,et al. Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.