Exploration of automatic optimization for CUDA programming
暂无分享,去创建一个
[1] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[2] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[3] Robert G. Belleman,et al. High Performance Direct Gravitational N-body Simulations on Graphics Processing Units , 2007, ArXiv.
[4] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[5] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[6] Gabe Rudy,et al. CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation , 2010 .
[7] Klaus Mueller,et al. Why do commodity graphics hardware boards (GPUs) work so well for acceleration of computed tomography? , 2007, Electronic Imaging.
[8] Tarek S. Abdelrahman,et al. hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.
[9] SkadronKevin,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008 .
[10] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[11] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[13] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[14] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[15] Simon Portegies Zwart,et al. High-performance direct gravitational N-body simulations on graphics processing units , 2007, astro-ph/0702058.
[16] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.