论文信息 - Fully Empirical Autotuned QR Factorization For - 字舞流文

Fully Empirical Autotuned QR Factorization For

Emmanuel Agullo | Jack Dongarra | Stanimire Tomov | Rajib Nath | Multicore Architectures | J. Dongarra | E. Agullo | S. Tomov | Rajib Nath | Multicore Architectures

[1] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[2] E. Kay,et al. Graph Theory. An Algorithmic Approach , 1975 .

[3] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4] Herb Sutter,et al. A Fundamental Turn Toward Concurrency in Software , 2008 .

[5] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[6] Emmanuel Agullo,et al. Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[7] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.

[8] Alan Edelman,et al. Autotuning multigrid with PetaBricks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[10] Emmanuel Agullo,et al. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[12] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.

[13] R. C. Whaley,et al. Towards interoperability: a wrapper model for integrating remote laboratories in a collaborative discovery learning environment , 2008 .

[14] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[15] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[16] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.