From tile algorithm to stripe algorithm: a CUBLAS-based parallel implementation on GPUs of Gauss method for the resolution of extremely large dense linear systems stored on an array of solid state devices
暂无分享,去创建一个
[1] Jack J. Dongarra,et al. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting , 2014, Concurr. Comput. Pract. Exp..
[2] Robert A. van de Geijn,et al. The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations , 2012, J. Parallel Distributed Comput..
[3] Emmanuel Agullo,et al. LU factorization for accelerator-based systems , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).
[4] Jack J. Dongarra,et al. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[5] Eric J. Kelmelis,et al. CULA: hybrid GPU accelerated linear algebra routines , 2010, Defense + Commercial Sensing.
[6] Emmanuel Agullo,et al. Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[7] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[8] Jack J. Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..
[9] Cui Yan,et al. An Optimization Load Balancing Algorithm Design in Massive Storage System , 2009, 2009 International Conference on Environmental Science and Information Application Technology.
[10] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[11] J. Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[13] Steven Skiena,et al. Optimizing triangle strips for fast rendering , 1996, Proceedings of Seventh Annual IEEE Visualization '96.
[14] Michele Colajanni,et al. Unifying and Optimizing Parallel Linear Algebra Algorithms , 1993, IEEE Trans. Parallel Distributed Syst..
[15] L. Trefethen,et al. Average-case stability of Gaussian elimination , 1990 .
[16] Michel Cosnard,et al. Gaussian Elimination on Message Passing Architecture , 1987, ICS.
[17] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .
[18] Robert A. van de Geijn,et al. BLAS (Basic Linear Algebra Subprograms) , 2011, Encyclopedia of Parallel Computing.
[19] Bowen Alpern,et al. Hierarchical Tiling: A Methodology for High Performance , 1996 .