The GPU Computing Revolution: From Multi-Core CPUs To Many-Core Graphics Processors

[1]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[2]  Jeffrey S. Vetter Toward exascale computational science with heterogeneous processing , 2010, GPGPU-3.

[3]  Michael B. Giles,et al.  Multigrid aircraft computations using the OPlus parallel library , 1996 .

[4]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[5]  Simon McIntosh-Smith,et al.  Energy-aware metrics for benchmarking heterogeneous systems , 2011, PERV.

[6]  Mark Bull,et al.  Development of mixed mode MPI / OpenMP applications , 2001, Sci. Program..

[7]  Per Brinch Hansen,et al.  Model programs for computational science: A programming methodology for multicomputers , 1993, Concurr. Pract. Exp..

[8]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[9]  Jack J. Dongarra,et al.  A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators , 2010, VECPAR.

[10]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[11]  David Kaeli,et al.  Heterogeneous Computing with OpenCL , 2011 .

[12]  Paul H. J. Kelly,et al.  Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures , 2012, Comput. J..

[13]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[14]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[15]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[16]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[17]  Hsien-Hsin S. Lee,et al.  Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.

[18]  Simon McIntosh-Smith,et al.  Benchmarking Energy Efficiency, Power Costs and Carbon Emissions on Heterogeneous Systems , 2012, Comput. J..

[19]  Satoshi Matsuoka,et al.  Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[20]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[21]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[22]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.