Fully Empirical Autotuned QR Factorization For

[1]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[2]  E. Kay,et al.  Graph Theory. An Algorithmic Approach , 1975 .

[3]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Herb Sutter,et al.  A Fundamental Turn Toward Concurrency in Software , 2008 .

[5]  Jack J. Dongarra,et al.  Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[6]  Emmanuel Agullo,et al.  Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[7]  Robert A. van de Geijn,et al.  Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.

[8]  Alan Edelman,et al.  Autotuning multigrid with PetaBricks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[10]  Emmanuel Agullo,et al.  Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[12]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[13]  R. C. Whaley,et al.  Towards interoperability: a wrapper model for integrating remote laboratories in a collaborative discovery learning environment , 2008 .

[14]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[15]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[16]  Emmanuel Agullo,et al.  QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.