论文信息 - From tile algorithm to stripe algorithm: a CUBLAS-based parallel implementation on GPUs of Gauss method for the resolution of extremely large dense linear systems stored on an array of solid state devices - 字舞流文

From tile algorithm to stripe algorithm: a CUBLAS-based parallel implementation on GPUs of Gauss method for the resolution of extremely large dense linear systems stored on an array of solid state devices

Manuel Carcenac

[1] Jack J. Dongarra,et al. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting , 2014, Concurr. Comput. Pract. Exp..

[2] Robert A. van de Geijn,et al. The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations , 2012, J. Parallel Distributed Comput..

[3] Emmanuel Agullo,et al. LU factorization for accelerator-based systems , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).

[4] Jack J. Dongarra,et al. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[5] Eric J. Kelmelis,et al. CULA: hybrid GPU accelerated linear algebra routines , 2010, Defense + Commercial Sensing.

[6] Emmanuel Agullo,et al. Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[7] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[8] Jack J. Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..

[9] Cui Yan,et al. An Optimization Load Balancing Algorithm Design in Massive Storage System , 2009, 2009 International Conference on Environmental Science and Information Application Technology.

[10] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[11] J. Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[13] Steven Skiena,et al. Optimizing triangle strips for fast rendering , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[14] Michele Colajanni,et al. Unifying and Optimizing Parallel Linear Algebra Algorithms , 1993, IEEE Trans. Parallel Distributed Syst..

[15] L. Trefethen,et al. Average-case stability of Gaussian elimination , 1990 .

[16] Michel Cosnard,et al. Gaussian Elimination on Message Passing Architecture , 1987, ICS.

[17] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[18] Robert A. van de Geijn,et al. BLAS (Basic Linear Algebra Subprograms) , 2011, Encyclopedia of Parallel Computing.

[19] Bowen Alpern,et al. Hierarchical Tiling: A Methodology for High Performance , 1996 .