Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs
暂无分享,去创建一个
Yi Yang | Huiyang Zhou | Ping Xiang | Mike Mantor | Yi Yang | Mike Mantor | Huiyang Zhou | Ping Xiang
[1] Ramani Duraiswami,et al. Middleware for programming NVIDIA GPUs from Fortran 9X , 2007 .
[2] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[3] Sean Rul,et al. An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.
[4] Jack Dongarra,et al. An Improved MAGMA GEMM for Fermi GPUs , 2010 .
[5] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[6] Noel Lopes,et al. GPUMLib: A new Library to combine Machine Learning algorithms with Graphics Processing Units , 2010, 2010 10th International Conference on Hybrid Intelligent Systems.
[7] Alice C. Quillen,et al. QYMSYM: A GPU-accelerated hybrid symplectic integrator that permits close encounters , 2010, 1007.3458.
[8] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[9] Thomas Fahringer,et al. Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design , 2011, Euro-Par.
[10] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[11] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[12] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Noriyuki Fujimoto. Dense Matrix-Vector Multiplication on the CUDA Architecture , 2008, Parallel Process. Lett..
[14] Kai Lu,et al. Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing , 2010, 2010 IEEE International Conference on Cluster Computing.
[15] Wolfgang Paul,et al. GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model , 2009, J. Comput. Phys..
[16] Amitabh Varshney,et al. High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.
[17] Wen-mei W. Hwu,et al. Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications , 2010, International Journal of Parallel Programming.