Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs
暂无分享,去创建一个
[1] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .
[2] Laura Grigori,et al. Adapting communication-avoiding LU and QR factorizations to multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Dinesh Manocha,et al. LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[5] Jack Dongarra,et al. A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[6] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.
[7] Laura Grigori,et al. A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines , 2012, ICCS.
[8] J. Demmel,et al. Implementing Communication-Optimal Parallel and Sequential QR Factorizations , 2008, 0809.2407.
[9] Jack J. Dongarra,et al. Exploiting Fine-Grain Parallelism in Recursive LU Factorization , 2011, PARCO.
[10] Jack J. Dongarra,et al. An Improved Magma Gemm For Fermi Graphics Processing Units , 2010, Int. J. High Perform. Comput. Appl..
[11] Jack J. Dongarra,et al. LU Factorization with Partial Pivoting for a Multicore System with Accelerators , 2013, IEEE Transactions on Parallel and Distributed Systems.
[12] William Gropp,et al. Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[13] Jack Dongarra,et al. Some issues in dense linear algebra for multicore and special purpose architectures , 2008 .
[14] Jack J. Dongarra,et al. Multi-GPU Implementation of LU Factorization , 2012, ICCS.
[15] Jack J. Dongarra,et al. Accelerating GPU Kernels for Dense Linear Algebra , 2010, VECPAR.
[16] James Demmel,et al. Communication Avoiding Gaussian elimination , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[18] Jack J. Dongarra,et al. Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.
[19] Pradeep Dubey,et al. Designing and dynamically load balancing hybrid LU for multi/many-core , 2011, Computer Science - Research and Development.
[20] Weichung Wang,et al. Tuning Block Size for QR Factorization on CPU-GPU Hybrid Systems , 2012, 2012 IEEE 6th International Symposium on Embedded Multicore SoCs.